nltk.corpus.reader.switchboard module¶
- class nltk.corpus.reader.switchboard.SwitchboardTurn[source]¶
Bases:
listA specialized list object used to encode switchboard utterances. The elements of the list are the words in the utterance; and two attributes,
speakerandid, are provided to retrieve the spearker identifier and utterance id. Note that utterance ids are only unique within a given discourse.
- class nltk.corpus.reader.switchboard.SwitchboardCorpusReader[source]¶
Bases:
CorpusReader- __init__(root, tagset=None)[source]¶
- Parameters
root (PathPointer or str) – A path pointer identifying the root directory for this corpus. If a string is specified, then it will be converted to a
PathPointerautomatically.fileids – A list of the files that make up this corpus. This list can either be specified explicitly, as a list of strings; or implicitly, as a regular expression over file paths. The absolute path for each file will be constructed by joining the reader’s root to each file name.
encoding –
The default unicode encoding for the files that make up the corpus. The value of
encodingcan be any of the following:A string:
encodingis the encoding name for all files.A dictionary:
encoding[file_id]is the encoding name for the file whose identifier isfile_id. Iffile_idis not inencoding, then the file contents will be processed using non-unicode byte strings.A list:
encodingshould be a list of(regexp, encoding)tuples. The encoding for a file whose identifier isfile_idwill be theencodingvalue for the first tuple whoseregexpmatches thefile_id. If no tuple’sregexpmatches thefile_id, the file contents will be processed using non-unicode byte strings.None: the file contents of all files will be processed using non-unicode byte strings.
tagset – The name of the tagset used by this corpus, to be used for normalizing or converting the POS tags returned by the
tagged_...()methods.