我正在研究Sentiment Analysis的代码。现在我想在我的代码片段中使用Stemmer,但是当我使用print函数时,结果显示词干不起作用。你知道我做错了什么吗? 这是我的代码片段:
pos_data = []
with open('Positive.txt') as f:
for line in f:
pos_data.append([format_sentence(line), 'pos'])
for line in f:
stemmer.stem(pos_data)
print (pos_data)
答案 0 :(得分:0)
您需要将文件拆分为行并可能将行拆分为单词(可以标记化)
>>> import nltk
>>> from nltk import PorterStemmer
>>> test = 'this sentence is just a tester set of words'
>>> test_tokenize = nltk.word_tokenize(test)
>>> test_tokenize
['this', 'sentence', 'is', 'just', 'a', 'tester', 'set', 'of', 'words']
>>> port = PorterStemmer()
>>> for word in test_tokenize:
... print port.stem(word)
...
thi
sentenc
is
just
a
tester
set
of
word
with open('Positive.txt', 'rb') as f:
for line in f.readlines():
words = nltk.word_tokenize(line)
for word in words:
print port.stem(word)
答案 1 :(得分:0)
您似乎没有正确调用Stemmer API,因为它一次只需要一个令牌。这意味着你应该首先对你的句子进行标记。查看此处的文档http://www.nltk.org/howto/stem.html
此外,为了将来参考,您应该包括完整的工作代码,导入和错误的堆栈跟踪。
with open('Positive.txt') as f:
for line in f:
tokens = format_sentence(line).split() # tokenize using spaces
stem_sentence = ' '.join([stemmer.stem(token) for token in tokens])
pos_data.append([stem_sentence, 'pos'])