在字典中检查来自2个单独列表的关键字

时间:2015-06-17 20:44:54

标签: python string list loops dictionary

我有一个字典,其中键为ID,值为字符串。我还有两个单独的关键字列表。 我需要过滤掉字典中的所有键,其中的值至少包含列表1中的一个关键字,以及列表2中的至少一个关键字。 我很困惑如何解决这个问题。请帮忙。

到目前为止,这就是我所拥有的:

# code loads all data from al.csv into a dictionary where the key is column 1 which is tweet ID and value is the the whole row including tweet ID.
reader = csv.reader(open('al.csv', 'r'))
overallDict = {}
for rows in reader:
    k = rows[0]
    v = rows[0] + ',' + rows[1] + ',' + rows[2] + ',' + rows[3] + ',' + rows[4] + ',' + rows[5] + ',' + rows[6] + ',' + rows[7] + ',' + rows[8] + ',' + rows[9]
    overallDict[k] = v

# The following lines of code loop loads keywords list
with open('slangNames.txt') as f:
    slangs = f.readlines()

# To strip new-line and prepare data into finished keywords list
strippedSlangs = []
for elements in slangs:
    elements = elements.strip()
    strippedSlangs.append(elements)

# The following lines of code loop loads risks list
with open('riskNames.txt') as f:
    risks = f.readlines()

# To strip new-line and prepare data into finished risks list
strippedRisks = []
for things in risks:
    things = things.strip()
    strippedRisks.append(things)

说List1 = [鸦片,圣诞节,杂草] 和List2 = [药物,有害,不好] 和词典= {213432:'鸦片对健康有害',321234:'圣诞节好',543678:'杂草不好'}

所需的输出需要是列表: 输出:[213432,543678]因为这两个相应的推文包含来自list1的至少一个值和来自list2的一个值。

1 个答案:

答案 0 :(得分:0)

首先,我不得不重写你的代码,以便更容易地弄清楚它在做什么:

strippedRisks = set()
strippedSlangs = set()
overallDict = {}

with open('al.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
       overallDict[row[0]] = ",".join(row[1:])

with open('slangNames.txt') as f:
    for line in f:
        elements = line.strip()
        strippedSlangs.add(elements)

with open('riskNames.txt') as f:
    for line in f:
        things = line.strip()
        strippedRisks.add(things)

好。您想知道词典中的哪些键在每个列表中都有值?换句话说,您想知道哪个values字典有一个不允许的单词。

你可以这样做:

for key, value in overallDict.items():
  if set(value.split(',')).intersection(strippedSlangs):
     # some words appear in strippedSlangs
  elif set(value.split(',')).intersection(strippedRisks)
     # some words appear in strippedRisks

但是,既然我已经看到了你想要做的事情,我只是从头开始使用集合并首先构建不允许的单词:

strippedRisks = set()
strippedSlangs = set()
overallDict = {}

with open('slangNames.txt') as f:
    for line in f:
        elements = line.strip()
        strippedSlangs.add(elements)

with open('riskNames.txt') as f:
    for line in f:
        things = line.strip()
        strippedRisks.add(things)

with open('al.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        values = set(row[1:])
        if strippredRisks.intersection(values) and strippedSlangs.intersection(values):
            # Words in both bad-word lists. Do we skip these or save them?
            pass
        else:
            overallDict[row[0]] = values

相信那是你想要完成的事情,但我并不完全确定。