Python:如果字符串包含特定关键字,则从列表中删除特定字符串

时间:2015-02-17 16:15:56

标签: python string list filter

如果字符串包含某些单词,我试图排除字符串列表中的某些字符串。

例如,如果有一个单词," cinnamon"或"水果"或者" eat",在字符串中,我希望将它从字符串列表中排除。

['RT @haussera: Access to Apple Pay customer data, no, but another way? everybody wins - MarketWatch http://t.co/Fm3LE2iTkY', "Landed in the US, tired w horrible migrane. The only thing helping- Connie's new song on repeat. #SoGood #Nashville https://t.co/AscR4VUkMP", 'I wish jacob would be my cinnamon apple', "I've collected 9,112 gold coins! http://t.co/T62o8NoP09 #iphone, #iphonegames, #gameinsight", 'HAHAHA THEY USED THE SAME ARTICLE AS INDEPENDENT http://t.co/mC7nfnhqSw', '@hot1079atl Let me know what you think of the new single "Mirage "\nhttps://t.co/k8DJ7oxkyg', 'RT @SWNProductions: Hey All so we have a new iTunes listing due to our old one getting messed up please resubscribe via the following https…', 'Shawty go them apple bottoms jeans and the boots with the furrrr with furrrr the whole club is looking at her', 'I highly recommend you use MyMedia - a powerfull download manager for the iPhone/iPad.  http://t.co/TWmYhgKwBH', 'Alusckが失われた時間の異常を解消しました http://t.co/peYgajYvQY http://t.co/sN3jAJnd1I', 'Театр радует туземцев! Теперь мой остров стал еще круче! http://t.co/EApBrIGghO #iphone, #iphonegames, #gameinsight', 'RT @AppIeOfficiel: Our iPhone 7    http://t.co/d2vCOCOTqt', 'Я выполнил задание "Подключаем резервы"! Заходите ко мне в гости! http://t.co/ZReExwwbxh #iphone #iphonegames #gameinsight', "RT @Louis_Tomlinson: @JennSelby Google 'original apple logo' and you will see the one printed on my shirt that you reported on. Trying to l…", "I've collected 4,100 gold coins! http://t.co/JZLQJdRtLG #iphone, #iphonegames, #gameinsight", "I've collected 28,800 gold coins! http://t.co/r3qXNHwUdp #iphone, #iphonegames, #gameinsight", 'RT @AppIeOfficiel: Our iPhone 7    http://t.co/d2vCOCOTqt']

keywordFilter=['eat','cinnamon','fruit']
for sent in list:
    for word in keywordFilter:
        if word in sent:
            list.remove(sent)

但它不会过滤我希望的关键字并返回原始列表。 有谁知道为什么?

第一次编辑:

import json
from json import *

tweets=[]

for line in open('apple.json'):
    try:
        tweets.append(json.loads(line))
    except:
        pass

keywordFilter=set(['pie','juice','cinnamon'])

for tweet in tweets:
    for key, value in tweet.items():
        if key=='text':
            tweetsF.append(value)

print(type(tweetsF))
print(len(tweetsF))

tweetsFBK=[sent for sent in tweetsF if not any(word in sent for word in keywordFilter)]
print(type(tweetsFBK))    
print(len(tweetsFBK))

以上是我到目前为止的代码。最多 tweetsF ,字符串存储得很好,我尝试使用keywordFilter排除字词。

然而 tweetsFBK 会给我0(没有)。有谁知道为什么?

3 个答案:

答案 0 :(得分:4)

以下是一个解决方案:

list = [sent for sent in list 
    if not any(word in sent for word in keywordFilter)]

它将删除包含列表keywordFilter中的一个单词的所有字符串作为子字符串。 例如,它会删除第二个字符串,因为它包含单词repeat(而eatrepeat的子字符串。)

如果您想避免这种情况,可以执行以下操作:

list = [sent for sent in list 
    if not any(word in sent.split(' ') for word in keywordFilter)]

它只删除包含列表keywordFilter中的一个单词的字符串作为子词(即由句子中的空格分隔)。

答案 1 :(得分:3)

您可以在列表推导中使用any来过滤

original_list = ['RT @haussera: Access to Apple Pay customer data, no, but another way? everybody wins - MarketWatch http://t.co/Fm3LE2iTkY', "Landed in the US, tired w horrible migrane. The only thing helping- Connie's new song on repeat. #SoGood #Nashville https://t.co/AscR4VUkMP", 'I wish jacob would be my cinnamon apple', "I've collected 9,112 gold coins! http://t.co/T62o8NoP09 #iphone, #iphonegames, #gameinsight", 'HAHAHA THEY USED THE SAME ARTICLE AS INDEPENDENT http://t.co/mC7nfnhqSw', '@hot1079atl Let me know what you think of the new single "Mirage "\nhttps://t.co/k8DJ7oxkyg', 'RT @SWNProductions: Hey All so we have a new iTunes listing due to our old one getting messed up please resubscribe via the following https…', 'Shawty go them apple bottoms jeans and the boots with the furrrr with furrrr the whole club is looking at her', 'I highly recommend you use MyMedia - a powerfull download manager for the iPhone/iPad.  http://t.co/TWmYhgKwBH', 'Alusckが失われた時間の異常を解消しました http://t.co/peYgajYvQY http://t.co/sN3jAJnd1I', 'Театр радует туземцев! Теперь мой остров стал еще круче! http://t.co/EApBrIGghO #iphone, #iphonegames, #gameinsight', 'RT @AppIeOfficiel: Our iPhone 7    http://t.co/d2vCOCOTqt', 'Я выполнил задание "Подключаем резервы"! Заходите ко мне в гости! http://t.co/ZReExwwbxh #iphone #iphonegames #gameinsight', "RT @Louis_Tomlinson: @JennSelby Google 'original apple logo' and you will see the one printed on my shirt that you reported on. Trying to l…", "I've collected 4,100 gold coins! http://t.co/JZLQJdRtLG #iphone, #iphonegames, #gameinsight", "I've collected 28,800 gold coins! http://t.co/r3qXNHwUdp #iphone, #iphonegames, #gameinsight", 'RT @AppIeOfficiel: Our iPhone 7    http://t.co/d2vCOCOTqt']

keywordFilter = set(['eat','cinnamon','fruit'])

filtered_list = [str for str in originial_list if not any(i in str for i in keywordFilter)]

答案 2 :(得分:0)

简单复杂:)

final_list = []
for i in original_list:
    temp = []
    for k in i.split(" "):
        if not any(i for i in keywordFilter if i in k):
            temp.append(k)
    final_list.append(" ".join(temp))
print final_list