nltk.probability.ConditionalFreqDist¶

class nltk.probability.ConditionalFreqDist[source]¶

Bases: defaultdict

A collection of frequency distributions for a single experiment run under different conditions. Conditional frequency distributions are used to record the number of times each sample occurred, given the condition under which the experiment was run. For example, a conditional frequency distribution could be used to record the frequency of each word (type) in a document, given its length. Formally, a conditional frequency distribution can be defined as a function that maps from each condition to the FreqDist for the experiment under that condition.

Conditional frequency distributions are typically constructed by repeatedly running an experiment under a variety of conditions, and incrementing the sample outcome counts for the appropriate conditions. For example, the following code will produce a conditional frequency distribution that encodes how often each word type occurs, given the length of that word type:

>>> from nltk.probability import ConditionalFreqDist
>>> from nltk.tokenize import word_tokenize
>>> sent = "the the the dog dog some other words that we do not care about"
>>> cfdist = ConditionalFreqDist()
>>> for word in word_tokenize(sent):
...     condition = len(word)
...     cfdist[condition][word] += 1

An equivalent way to do this is with the initializer:

>>> cfdist = ConditionalFreqDist((len(word), word) for word in word_tokenize(sent))

The frequency distribution for each condition is accessed using the indexing operator:

>>> cfdist[3]
FreqDist({'the': 3, 'dog': 2, 'not': 1})
>>> cfdist[3].freq('the')
0.5
>>> cfdist[3]['dog']
2

When the indexing operator is used to access the frequency distribution for a condition that has not been accessed before, ConditionalFreqDist creates a new empty FreqDist for that condition.

__init__(cond_samples=None)[source]¶

Construct a new empty conditional frequency distribution. In particular, the count for every sample, under every condition, is zero.

Parameters: cond_samples (Sequence of (condition, sample) tuples) – The samples to initialize the conditional frequency distribution with

conditions()[source]¶

Return a list of the conditions that have been accessed for this ConditionalFreqDist. Use the indexing operator to access the frequency distribution for a given condition. Note that the frequency distributions for some conditions may contain zero sample outcomes.

Return type: list

N()[source]¶

Return the total number of sample outcomes that have been recorded by this ConditionalFreqDist.

Return type: int

plot(*args, samples=None, title='', cumulative=False, percents=False, conditions=None, show=True, **kwargs)[source]¶

Plot the given samples from the conditional frequency distribution. For a cumulative plot, specify cumulative=True. Additional *args and **kwargs are passed to matplotlib’s plot function. (Requires Matplotlib to be installed.)

Parameters

samples (list) – The samples to plot
title (str) – The title for the graph
cumulative (bool) – Whether the plot is cumulative. (default = False)
percents (bool) – Whether the plot uses percents instead of counts. (default = False)
conditions (list) – The conditions to plot (default is all)
show (bool) – Whether to show the plot, or only return the ax.

tabulate(*args, **kwargs)[source]¶

Tabulate the given samples from the conditional frequency distribution.

Parameters

samples (list) – The samples to plot
conditions (list) – The conditions to plot (default is all)
cumulative – A flag to specify whether the freqs are cumulative (default = False)

deepcopy()[source]¶

copy() → a shallow copy of D.[source]¶

__new__(**kwargs)¶

clear() → None. Remove all items from D.¶

default_factory¶: Factory for default value called by __missing__().

fromkeys(value=None, /)¶: Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, /)¶: Return the value for key if key is in the dictionary, else default.

items() → a set-like object providing a view on D's items¶

keys() → a set-like object providing a view on D's keys¶

pop(k[, d]) → v, remove specified key and return the corresponding value.¶: If key is not found, default is returned if given, otherwise KeyError is raised

popitem()¶

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

setdefault(key, default=None, /)¶

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) → None. Update D from dict/iterable E and F.¶: If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() → an object providing a view on D's values¶

NLTK

Documentation

nltk.probability.ConditionalFreqDist¶