我想在文件中做词干。当我在终端中使用它时工作正常,但是当我在文本文件中应用它时,它不起作用。 终端代码:
print PorterStemmer().stem_word('complications')
功能代码:
def stemming_text_1():
with open('test.txt', 'r') as f:
text = f.read()
print text
singles = []
stemmer = PorterStemmer() #problem from HERE
for plural in text:
singles.append(stemmer.stem(plural))
print singles
输入test.txt
126211 crashes bookmarks runs error logged debug core bookmarks
126262 manual change crashes bookmarks propagated ion view bookmarks
期望/预期输出
126211 crash bookmark runs error logged debug core bookmark
126262 manual change crash bookmark propagated ion view bookmark
非常感谢任何建议,谢谢:)
答案 0 :(得分:2)
您需要将文本拆分为单词以使词干分析器起作用。目前,变量text
将整个文件包含为一个大字符串。循环for plural in text:
将text
中的每个字符分配给plural
。
请尝试for plural in text.split():
。
[编辑] 要以您想要的格式获取输出,您需要逐行读取文件,而不是一次性读取所有文件:
def stemming_text_1():
with open('test.txt', 'r') as f:
for line in f:
print line
singles = []
stemmer = PorterStemmer() #problem from HERE
for plural in line.split():
singles.append(stemmer.stem(plural))
print ' '.join(singles)