我有一套清单:
a = [{'foo','cpu','phone'},{'foo','mouse'}, {'dog','cat'}, {'cpu'}]
预期结果:
我想查看每个单独的字符串,进行计数并以原始格式返回所有内容x >= 2
:
a = [{'foo','cpu'}, {'foo'}, {'cpu'}]
这是我到目前为止的内容,但是我停留在最后一部分,需要附加新列表:
from collections import Counter
counter = Counter()
for a_set in a:
# Created a counter to count the occurrences a word
counter.update(a_set)
result = []
for a_set in a:
for word in a_set:
if counter[word] >= 2:
# Not sure how I should append my new set below.
result.append(a_set)
break
print(result)
答案 0 :(得分:0)
您只是要附加原始集。因此,您应该使用至少出现两次的单词来创建一个新集合。
result = []
for a_set in a:
new_set = {
word for word in a_set
if counter[word] >= 2
}
if new_set: # check if new set is not empty
result.append(new_set)
答案 1 :(得分:0)
相反,请基于集合交集使用以下简短方法:
from collections import Counter
a = [{'foo','cpu','phone'},{'foo','mouse'}, {'dog','cat'}, {'cpu'}]
c = Counter([i for s in a for i in s])
valid_keys = {k for k,v in c.items() if v >= 2}
res = [s & valid_keys for s in a if s & valid_keys]
print(res) # [{'cpu', 'foo'}, {'foo'}, {'cpu'}]
答案 2 :(得分:0)
这就是我最终要做的:
建立一个计数器,然后遍历原始集集并过滤<2个计数的项目,然后过滤任何空集:
GtkCellRendererText