nltk.tokenize.sexpr_tokenize¶
- nltk.tokenize.sexpr_tokenize(text)[source]¶
Return a list of s-expressions extracted from text. For example:
>>> SExprTokenizer().tokenize('(a b (c d)) e f (g)') ['(a b (c d))', 'e', 'f', '(g)']
All parentheses are assumed to mark s-expressions. (No special processing is done to exclude parentheses that occur inside strings, or following backslash characters.)
If the given expression contains non-matching parentheses, then the behavior of the tokenizer depends on the
strictparameter to the constructor. IfstrictisTrue, then raise aValueError. IfstrictisFalse, then any unmatched close parentheses will be listed as their own s-expression; and the last partial s-expression with unmatched open parentheses will be listed as its own s-expression:>>> SExprTokenizer(strict=False).tokenize('c) d) e (f (g') ['c', ')', 'd', ')', 'e', '(f (g']
- Parameters
text (str or iter(str)) – the string to be tokenized
- Return type
iter(str)