Python遏制文件中的单词

时间:2013-05-30 11:50:03

标签: python nltk stemming

我想在文件中做词干。当我在终端中使用它时工作正常,但是当我在文本文件中应用它时,它不起作用。 终端代码:

print PorterStemmer().stem_word('complications')

功能代码:

def stemming_text_1():
    with open('test.txt', 'r') as f:
        text = f.read()
        print text
        singles = []    

        stemmer = PorterStemmer() #problem from HERE
        for plural in text:
            singles.append(stemmer.stem(plural))
        print singles

输入test.txt

126211 crashes bookmarks runs error logged debug core bookmarks
126262 manual change crashes bookmarks propagated ion view bookmarks

期望/预期输出

126211 crash bookmark runs error logged debug core bookmark
126262 manual change crash bookmark propagated ion view bookmark

非常感谢任何建议,谢谢:)

1 个答案:

答案 0 :(得分:2)

您需要将文本拆分为单词以使词干分析器起作用。目前,变量text将整个文件包含为一个大字符串。循环for plural in text:text中的每个字符分配给plural

请尝试for plural in text.split():

[编辑] 要以您想要的格式获取输出,您需要逐行读取文件,而不是一次性读取所有文件:

def stemming_text_1():
    with open('test.txt', 'r') as f:
        for line in f:
            print line
            singles = []

            stemmer = PorterStemmer() #problem from HERE
            for plural in line.split():
                singles.append(stemmer.stem(plural))
            print ' '.join(singles)