nltk.probability.WittenBellProbDist¶
- class nltk.probability.WittenBellProbDist[source]¶
Bases:
ProbDistI
The Witten-Bell estimate of a probability distribution. This distribution allocates uniform probability mass to as yet unseen events by using the number of events that have only been seen once. The probability mass reserved for unseen events is equal to T / (N + T) where T is the number of observed event types and N is the total number of observed events. This equates to the maximum likelihood estimate of a new type event occurring. The remaining probability mass is discounted such that all probability estimates sum to one, yielding:
p = T / Z (N + T), if count = 0
p = c / (N + T), otherwise
- __init__(freqdist, bins=None)[source]¶
Creates a distribution of Witten-Bell probability estimates. This distribution allocates uniform probability mass to as yet unseen events by using the number of events that have only been seen once. The probability mass reserved for unseen events is equal to T / (N + T) where T is the number of observed event types and N is the total number of observed events. This equates to the maximum likelihood estimate of a new type event occurring. The remaining probability mass is discounted such that all probability estimates sum to one, yielding:
p = T / Z (N + T), if count = 0
p = c / (N + T), otherwise
The parameters T and N are taken from the
freqdist
parameter (theB()
andN()
values). The normalizing factor Z is calculated using these values along with thebins
parameter.- Parameters
freqdist (FreqDist) – The frequency counts upon which to base the estimation.
bins (int) – The number of possible event types. This must be at least as large as the number of bins in the
freqdist
. If None, then it’s assumed to be equal to that of thefreqdist
- prob(sample)[source]¶
Return the probability for a given sample. Probabilities are always real numbers in the range [0, 1].
- Parameters
sample (any) – The sample whose probability should be returned.
- Return type
float
- max()[source]¶
Return the sample with the greatest probability. If two or more samples have the same probability, return one of them; which sample is returned is undefined.
- Return type
any
- samples()[source]¶
Return a list of all samples that have nonzero probabilities. Use
prob
to find the probability of each sample.- Return type
list
- discount()[source]¶
Return the ratio by which counts are discounted on average: c*/c
- Return type
float
- SUM_TO_ONE = True¶
True if the probabilities of the samples in this probability distribution will always sum to one.