你好我试图用python stemmer来阻止话语,我试过Porter和Lancaster,但是他们有同样的问题。他们无法阻止以#呃"结尾的相关词汇。或者" e"。
例如,他们阻止computer --> comput
rotate --> rotat
这是代码的一部分
line=line.lower()
line=re.sub(r'[^a-z0-9 ]',' ',line)
line=line.split()
line=[x for x in line if x not in stops]
line=[ porter.stem(word, 0, len(word)-1) for word in line]
# or 'line=[ st.stem(word) for word in line]'
return line
有什么想法解决这个问题?
答案 0 :(得分:1)
引用the page on Wikipedia,In computational linguistics, a stem is the part of the word that never changes even when morphologically inflected, whilst a lemma is the base form of the word. For example, given the word "produced", its lemma (linguistics) is "produce", however the stem is "produc": this is because there are words such as production.
因此,您的代码可能会给您正确的结果。你似乎期望一个引理不是一个词干分子产生的引理(除非引理恰好与词干相等)