PorterStemmer似乎不起作用

时间:2012-10-19 12:11:00

标签: python nltk porter-stemmer

我是python的新手并且练习书中的例子 任何人都可以解释为什么当我试图用这个代码来阻止一些例子时没有任何改变?

>>> from nltk.stem import PorterStemmer
>>> stemmer=PorterStemmer()
>>> stemmer.stem('numpang wifi stop gadget shopping')
'numpang wifi stop gadget shopping'

但是当我这样做时,它会起作用

>>> stemmer.stem('shopping')
'shop'

3 个答案:

答案 0 :(得分:11)

试试这个:

res = ",".join([ stemmer.stem(kw) for kw in 'numpang wifi stop gadget shopping'.split(" ")])

问题在于,干扰器可能是单个单词。你的字符串没有“root”字样,而单词“shopping”则有根“shop”。所以你必须单独计算词干

编辑:

来自他们的源代码 - >

Stemming algorithms attempt to automatically remove suffixes (and in some
cases prefixes) in order to find the "root word" or stem of a given word. This
is useful in various natural language processing scenarios, such as search.

所以我猜你确实被迫自己分裂你的字符串

答案 1 :(得分:3)

词干是将给定单词缩减为基础或变形形式的过程,在这里你试图阻止整个句子,

请按以下步骤操作

from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer
sentence = "numpang wifi stop gadget shopping"
tokens = word_tokenize(sentence)
stemmer=PorterStemmer()

Output=[stemmer.stem(word) for word in tokens]

答案 2 :(得分:0)

尝试一下:

from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize

stemmer = PorterStemmer()

some_text = "numpang wifi stop gadget shopping"

words = word_tokenize(some_text)

for word in words:
    print(stemmer.stem(word))