我有一个停用词(德语)列表,我想用来从输入文本中过滤掉相同的词,它看起来像这样:
stopwortlist = ['ab', 'aber','abgesehen', 'alle', 'allein', 'aller', 'alles']
text = input('please put in a Text')
#i have found a way of controlling them online, but it doesnt quite work,
#cause it gives out a list, and all i want is a text (where the words from
#the list are filtered out
def filterStopwords (eingabeText, stopwords):
out = [word for word in eingabeText if word not in stopwords]
return out;
我应该如何修改函数来获取结果? 非常感谢提前
答案 0 :(得分:2)
将传入的文本拆分为单词(否则您将迭代字符),过滤停用词然后重新加入结果列表。
stopwortlist = ['ab', 'aber','abgesehen', 'alle', 'allein', 'aller', 'alles']
text = 'Some text ab aber with stopwords allein in'
def filterStopwords(eingabeText, stopwords):
out = [word for word in eingabeText.split() if word not in stopwords]
return ' '.join(out)
filterStopwords(text, stopwortlist) # => 'Some text with stopwords in'
答案 1 :(得分:-1)
这里是使用过滤器和连接方法的单线程。
stopwortlist = ['ab', 'aber','abgesehen', 'alle', 'allein', 'aller', 'alles']
text = 'There are ab aber multiple allein abgesehen words in alles this ab list'
print " ".join(filter(lambda x: x not in stopwortlist, text.split()))
#Output
There are multiple words in this list
这基本上使用lambda函数来检查单词是否在stopwortlist
中,然后将其从字符串中过滤掉。