我有5个单词清单。我需要查找出现在2个以上列表中的所有单词。任何单词在列表中都可以出现多次。
我用过collections.Counter,但是它只返回单个列表中所有单词的频率。
{{1}}
例如,这些列表的输出应为['log':4,'branch':3],因为'log'存在于4个列表中,而'branch'存在于3个列表中。
答案 0 :(得分:3)
没有Counter
:
a = ['wood', 'tree', 'bark', 'log']
b = ['branch', 'mill', 'boat', 'boat', 'house']
c = ['log', 'tree', 'water', 'boat']
d = ['water', 'log', 'branch', 'water']
e = ['branch', 'rock', 'log']
all_lists = [a, b, c, d, e]
all_words = set().union(w for l in all_lists for w in l)
out = {}
for word in all_words:
s = sum(word in l for l in all_lists)
if s > 2:
out[word] = s
print(out)
打印:
{'branch': 3, 'log': 4}
编辑(以打印列表名称):
a = ['wood', 'tree', 'bark', 'log']
b = ['branch', 'mill', 'boat', 'boat', 'house']
c = ['log', 'tree', 'water', 'boat']
d = ['water', 'log', 'branch', 'water']
e = ['branch', 'rock', 'log']
all_lists = {'a':a, 'b':b, 'c':c, 'd':d, 'e':e}
all_words = set().union(w for l in all_lists.values() for w in l)
out = {}
for word in all_words:
s = sum(word in l for l in all_lists.values())
if s > 2:
out[word] = s
for k, v in out.items():
print('Word : {}'.format(k))
print('Count: {}'.format(v))
print('Lists: {}'.format(', '.join(kk for kk, vv in all_lists.items() if k in vv )))
print()
打印:
Word : log
Count: 4
Lists: a, c, d, e
Word : branch
Count: 3
Lists: b, d, e
答案 1 :(得分:1)
您可以sum
个计数器-从空的Counter()
开始:
from collections import Counter
lists = [a, b, c, d, e]
total = sum((Counter(set(lst)) for lst in lists), Counter())
# Counter({'log': 4, 'branch': 3, 'tree': 2, 'boat': 2, 'water': 2,
# 'wood': 1, 'bark': 1, 'house': 1, 'mill': 1, 'rock': 1})
res = {word: occ for word, occ in total.items() if occ > 2}
# {'log': 4, 'branch': 3}
请注意,我首先将所有列表转换为set
,以避免重复计数同一列表中多次出现的单词。
如果您需要知道单词的来源列表,可以尝试以下操作:
lists = {"a": a, "b": b, "c": c, "d": d, "e": e}
total = sum((Counter(set(lst)) for lst in lists.values()), Counter())
# Counter({'log': 4, 'branch': 3, 'tree': 2, 'boat': 2, 'water': 2,
# 'wood': 1, 'bark': 1, 'house': 1, 'mill': 1, 'rock': 1})
res = {word: occ for word, occ in total.items() if occ > 2}
# {'log': 4, 'branch': 3}
word_appears_in = {
word: [key for key, value in lists.items() if word in value] for word in res
}
# {'log': ['a', 'c', 'd', 'e'], 'branch': ['b', 'd', 'e']}