在Python中按列表过滤文本

时间:2016-07-07 15:15:30

标签: python python-2.7 python-3.x

我有一个停用词(德语)列表,我想用来从输入文本中过滤掉相同的词,它看起来像这样:

stopwortlist = ['ab', 'aber','abgesehen', 'alle', 'allein', 'aller', 'alles']
text = input('please put in a Text')
#i have found a way of controlling them online, but it doesnt quite work,
#cause it gives out a list, and all i want is a text (where the words from 
#the list are filtered out

def filterStopwords (eingabeText, stopwords):

    out = [word for word in eingabeText if word not in stopwords]
    return out;

我应该如何修改函数来获取结果? 非常感谢提前

2 个答案:

答案 0 :(得分:2)

将传入的文本拆分为单词(否则您将迭代字符),过滤停用词然后重新加入结果列表。

stopwortlist = ['ab', 'aber','abgesehen', 'alle', 'allein', 'aller', 'alles']
text = 'Some text ab aber with stopwords allein in'

def filterStopwords(eingabeText, stopwords):
    out = [word for word in eingabeText.split() if word not in stopwords]
    return ' '.join(out)

filterStopwords(text, stopwortlist) # => 'Some text with stopwords in'

答案 1 :(得分:-1)

这里是使用过滤器和连接方法的单线程。

stopwortlist = ['ab', 'aber','abgesehen', 'alle', 'allein', 'aller', 'alles']
text = 'There are ab aber multiple allein abgesehen words in alles this ab list'

print " ".join(filter(lambda x: x not in stopwortlist, text.split()))

#Output
There are multiple words in this list

这基本上使用lambda函数来检查单词是否在stopwortlist中,然后将其从字符串中过滤掉。