我得到这个错误"被驱逐的字符串或缓冲区"

时间:2015-08-03 08:32:34

标签: python-3.x

file = open("C:\\Users\\file.txt")

text = file.read()
def ie_preprocess(text):

  sent_tokenizer = PunktSentenceTokenizer(text)
  sents=sent_tokenizer.tokenize(text)
  print(sents)
  word_tokenizer = WordPunctTokenizer()
  words =nltk.word_tokenize(sents)
  print(words)

  tagges = nltk.pos_tag(words)
  print(tagges)

ie_preprocess(text)

1 个答案:

答案 0 :(得分:1)

nltk.word_tokenize()接收text,它应该是一个字符串,但是你传递的是sents,这是一个句子列表。

相反,你想要:

words = nltk.word_tokenize(text)

如果您想将每个句子标记为单词列表并将其作为列表列表取回,则可以使用

words = [nltk.word_tokenize(sentence) for sentence in sents]