file = open("C:\\Users\\file.txt")
text = file.read()
def ie_preprocess(text):
sent_tokenizer = PunktSentenceTokenizer(text)
sents=sent_tokenizer.tokenize(text)
print(sents)
word_tokenizer = WordPunctTokenizer()
words =nltk.word_tokenize(sents)
print(words)
tagges = nltk.pos_tag(words)
print(tagges)
ie_preprocess(text)
答案 0 :(得分:1)
nltk.word_tokenize()
接收text
,它应该是一个字符串,但是你传递的是sents
,这是一个句子列表。
相反,你想要:
words = nltk.word_tokenize(text)
如果您想将每个句子标记为单词列表并将其作为列表列表取回,则可以使用
words = [nltk.word_tokenize(sentence) for sentence in sents]