NTLK TOKENIZE / TOKENIZER
# -*- coding: utf-8 -*-
# Natural Language Toolkit: Tokenizers
#
# Copyright (C) 2001-2017 NLTK Project
# Author: Edward Loper
# Steven Bird (minor additions)
# Contributors: matthewmc, clouds56
# URL:
# For license information, see LICENSE.TXT
r"""
NLTK Tokenizer Package
Tokenizers divide strings into lists of substrings. For example,
tokenizers can be used to find the words and punctuation in a string:
>>> from nltk.tokenize import word_tokenize
>>> s = '''Good muffins cost $3.88\nin New York. Please buy me
... two of them.\n\nThanks.'''
>>> word_tokenize(s)
['Good', 'muffins', 'cost', '$', '3.88', 'in', 'New', 'York', '.',
'Please', 'buy', 'me', 'two', 'of', 'them', '.', 'Thanks', '.']
This particular tokenizer requires the Punkt sente…
NTLK TOKENIZE / TOKENIZER