nltk.collections module¶
- class nltk.collections.OrderedDict[source]¶
Bases:
dict- popitem()[source]¶
Remove and return a (key, value) pair as a 2-tuple.
Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.
- setdefault(key, failobj=None)[source]¶
Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
- class nltk.collections.AbstractLazySequence[source]¶
Bases:
objectAn abstract base class for read-only sequences whose values are computed as needed. Lazy sequences act like tuples – they can be indexed, sliced, and iterated over; but they may not be modified.
The most common application of lazy sequences in NLTK is for corpus view objects, which provide access to the contents of a corpus without loading the entire corpus into memory, by loading pieces of the corpus from disk as needed.
The result of modifying a mutable element of a lazy sequence is undefined. In particular, the modifications made to the element may or may not persist, depending on whether and when the lazy sequence caches that element’s value or reconstructs it from scratch.
Subclasses are required to define two methods:
__len__()anditerate_from().
- class nltk.collections.LazySubsequence[source]¶
Bases:
AbstractLazySequenceA subsequence produced by slicing a lazy sequence. This slice keeps a reference to its source sequence, and generates its values by looking them up in the source sequence.
- MIN_SIZE = 100¶
The minimum size for which lazy slices should be created. If
LazySubsequence()is called with a subsequence that is shorter thanMIN_SIZE, then a tuple will be returned instead.
- class nltk.collections.LazyConcatenation[source]¶
Bases:
AbstractLazySequenceA lazy sequence formed by concatenating a list of lists. This underlying list of lists may itself be lazy.
LazyConcatenationmaintains an index that it uses to keep track of the relationship between offsets in the concatenated lists and offsets in the sublists.
- class nltk.collections.LazyMap[source]¶
Bases:
AbstractLazySequenceA lazy sequence whose elements are formed by applying a given function to each element in one or more underlying lists. The function is applied lazily – i.e., when you read a value from the list,
LazyMapwill calculate that value by applying its function to the underlying lists’ value(s).LazyMapis essentially a lazy version of the Python primitive functionmap. In particular, the following two expressions are equivalent:>>> from nltk.collections import LazyMap >>> function = str >>> sequence = [1,2,3] >>> map(function, sequence) ['1', '2', '3'] >>> list(LazyMap(function, sequence)) ['1', '2', '3']
Like the Python
mapprimitive, if the source lists do not have equal size, then the value None will be supplied for the ‘missing’ elements.Lazy maps can be useful for conserving memory, in cases where individual values take up a lot of space. This is especially true if the underlying list’s values are constructed lazily, as is the case with many corpus readers.
A typical example of a use case for this class is performing feature detection on the tokens in a corpus. Since featuresets are encoded as dictionaries, which can take up a lot of memory, using a
LazyMapcan significantly reduce memory usage when training and running classifiers.
- class nltk.collections.LazyZip[source]¶
Bases:
LazyMapA lazy sequence whose elements are tuples, each containing the i-th element from each of the argument sequences. The returned list is truncated in length to the length of the shortest argument sequence. The tuples are constructed lazily – i.e., when you read a value from the list,
LazyZipwill calculate that value by forming a tuple from the i-th element of each of the argument sequences.LazyZipis essentially a lazy version of the Python primitive functionzip. In particular, an evaluated LazyZip is equivalent to a zip:>>> from nltk.collections import LazyZip >>> sequence1, sequence2 = [1, 2, 3], ['a', 'b', 'c'] >>> zip(sequence1, sequence2) [(1, 'a'), (2, 'b'), (3, 'c')] >>> list(LazyZip(sequence1, sequence2)) [(1, 'a'), (2, 'b'), (3, 'c')] >>> sequences = [sequence1, sequence2, [6,7,8,9]] >>> list(zip(*sequences)) == list(LazyZip(*sequences)) True
Lazy zips can be useful for conserving memory in cases where the argument sequences are particularly long.
A typical example of a use case for this class is combining long sequences of gold standard and predicted values in a classification or tagging task in order to calculate accuracy. By constructing tuples lazily and avoiding the creation of an additional long sequence, memory usage can be significantly reduced.
- class nltk.collections.LazyEnumerate[source]¶
Bases:
LazyZipA lazy sequence whose elements are tuples, each containing a count (from zero) and a value yielded by underlying sequence.
LazyEnumerateis useful for obtaining an indexed list. The tuples are constructed lazily – i.e., when you read a value from the list,LazyEnumeratewill calculate that value by forming a tuple from the count of the i-th element and the i-th element of the underlying sequence.LazyEnumerateis essentially a lazy version of the Python primitive functionenumerate. In particular, the following two expressions are equivalent:>>> from nltk.collections import LazyEnumerate >>> sequence = ['first', 'second', 'third'] >>> list(enumerate(sequence)) [(0, 'first'), (1, 'second'), (2, 'third')] >>> list(LazyEnumerate(sequence)) [(0, 'first'), (1, 'second'), (2, 'third')]
Lazy enumerations can be useful for conserving memory in cases where the argument sequences are particularly long.
A typical example of a use case for this class is obtaining an indexed list for a long sequence of values. By constructing tuples lazily and avoiding the creation of an additional long sequence, memory usage can be significantly reduced.
- class nltk.collections.LazyIteratorList[source]¶
Bases:
AbstractLazySequenceWraps an iterator, loading its elements on demand and making them subscriptable. __repr__ displays only the first few elements.
- class nltk.collections.Trie[source]¶
Bases:
dictA Trie implementation for strings
- LEAF = True¶
- __init__(strings=None)[source]¶
Builds a Trie object, which is built around a
dictIf
stringsis provided, it will add thestrings, which consist of alistofstrings, to the Trie. Otherwise, it’ll construct an empty Trie.- Parameters
strings (list(str)) – List of strings to insert into the trie (Default is
None)