nltk.probability.CrossValidationProbDist¶

class nltk.probability.CrossValidationProbDist[source]¶

Bases: ProbDistI

The cross-validation estimate for the probability distribution of the experiment used to generate a set of frequency distribution. The “cross-validation estimate” for the probability of a sample is found by averaging the held-out estimates for the sample in each pair of frequency distributions.

SUM_TO_ONE = False¶: True if the probabilities of the samples in this probability distribution will always sum to one.

__init__(freqdists, bins)[source]¶

Use the cross-validation estimate to create a probability distribution for the experiment used to generate freqdists.

Parameters

freqdists (list(FreqDist)) – A list of the frequency distributions generated by the experiment.
bins (int) – The number of sample values that can be generated by the experiment that is described by the probability distribution. This value must be correctly set for the probabilities of the sample values to sum to one. If bins is not specified, it defaults to freqdist.B().

freqdists()[source]¶

Return the list of frequency distributions that this ProbDist is based on.

Return type: list(FreqDist)

samples()[source]¶

Return a list of all samples that have nonzero probabilities. Use prob to find the probability of each sample.

Return type: list

prob(sample)[source]¶

Return the probability for a given sample. Probabilities are always real numbers in the range [0, 1].

Parameters: sample (any) – The sample whose probability should be returned.
Return type: float

discount()[source]¶

Return the ratio by which counts are discounted on average: c*/c

Return type: float

generate()[source]¶: Return a randomly selected sample from this probability distribution. The probability of returning each sample samp is equal to self.prob(samp).

logprob(sample)[source]¶

Return the base 2 logarithm of the probability for a given sample.

Parameters: sample (any) – The sample whose probability should be returned.
Return type: float

abstract max()[source]¶

Return the sample with the greatest probability. If two or more samples have the same probability, return one of them; which sample is returned is undefined.

Return type: any

NLTK

Documentation

nltk.probability.CrossValidationProbDist¶