我有一个字典,其中键为ID,值为字符串。我还有两个单独的关键字列表。 我需要过滤掉字典中的所有键,其中的值至少包含列表1中的一个关键字,以及列表2中的至少一个关键字。 我很困惑如何解决这个问题。请帮忙。
到目前为止,这就是我所拥有的:
# code loads all data from al.csv into a dictionary where the key is column 1 which is tweet ID and value is the the whole row including tweet ID.
reader = csv.reader(open('al.csv', 'r'))
overallDict = {}
for rows in reader:
k = rows[0]
v = rows[0] + ',' + rows[1] + ',' + rows[2] + ',' + rows[3] + ',' + rows[4] + ',' + rows[5] + ',' + rows[6] + ',' + rows[7] + ',' + rows[8] + ',' + rows[9]
overallDict[k] = v
# The following lines of code loop loads keywords list
with open('slangNames.txt') as f:
slangs = f.readlines()
# To strip new-line and prepare data into finished keywords list
strippedSlangs = []
for elements in slangs:
elements = elements.strip()
strippedSlangs.append(elements)
# The following lines of code loop loads risks list
with open('riskNames.txt') as f:
risks = f.readlines()
# To strip new-line and prepare data into finished risks list
strippedRisks = []
for things in risks:
things = things.strip()
strippedRisks.append(things)
说List1 = [鸦片,圣诞节,杂草] 和List2 = [药物,有害,不好] 和词典= {213432:'鸦片对健康有害',321234:'圣诞节好',543678:'杂草不好'}
所需的输出需要是列表: 输出:[213432,543678]因为这两个相应的推文包含来自list1的至少一个值和来自list2的一个值。
答案 0 :(得分:0)
首先,我不得不重写你的代码,以便更容易地弄清楚它在做什么:
strippedRisks = set()
strippedSlangs = set()
overallDict = {}
with open('al.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
overallDict[row[0]] = ",".join(row[1:])
with open('slangNames.txt') as f:
for line in f:
elements = line.strip()
strippedSlangs.add(elements)
with open('riskNames.txt') as f:
for line in f:
things = line.strip()
strippedRisks.add(things)
好。您想知道词典中的哪些键在每个列表中都有值?换句话说,您想知道哪个values
字典有一个不允许的单词。
你可以这样做:
for key, value in overallDict.items():
if set(value.split(',')).intersection(strippedSlangs):
# some words appear in strippedSlangs
elif set(value.split(',')).intersection(strippedRisks)
# some words appear in strippedRisks
但是,既然我已经看到了你想要做的事情,我只是从头开始使用集合并首先构建不允许的单词:
strippedRisks = set()
strippedSlangs = set()
overallDict = {}
with open('slangNames.txt') as f:
for line in f:
elements = line.strip()
strippedSlangs.add(elements)
with open('riskNames.txt') as f:
for line in f:
things = line.strip()
strippedRisks.add(things)
with open('al.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
values = set(row[1:])
if strippredRisks.intersection(values) and strippedSlangs.intersection(values):
# Words in both bad-word lists. Do we skip these or save them?
pass
else:
overallDict[row[0]] = values
我相信那是你想要完成的事情,但我并不完全确定。