应用错误收集

如何在保留每个令牌之后是否存在空格的信息的同时，用nltk对一个句子进行单词标记。

例如，我希望得到以下句子：

import nltk
sent = u"A sentence that contains words but don't contain numbers"
tokens = nltk.tokenize.word_tokenize(sent)
#tokens=['A',
# 'sentence',
# 'that',
# 'contains',
# 'words',
# 'but',
# 'do',
# "n't",
# 'contain',
# 'numbers']

以下列表：

spaces = [True, True, True, True, True, True, False, True, True, False]

告诉列表tokens中的每个标记是否在原始句子sent之后有空格。

使用ntlk单词标记生成器的尾随空格

0 个答案: