是否有机会对此进行优化:
import itertools
data = [['apple', 'banana', 'banana'],['apple', 'strawberry'], ['banana', 'lemon']]
Text = itertools.chain(*data)
for i in list(set(Text)):
print i, sum([1 for j in data if i in j])
输出:
strawberry 1
lemon 1
apple 2
banana 2
答案 0 :(得分:3)
from collections import Counter
c = Counter()
for d in data:
c.update(set(d))
c
>>>> Counter({'apple': 2, 'banana': 2, 'strawberry': 1, 'lemon': 1})
答案 1 :(得分:2)
使用collections.Counter()
object计算每个单词的文档数:
from collections import Counter
data = [['apple', 'banana', 'banana'], ['apple', 'strawberry'], ['banana', 'lemon']]
counts = Counter()
for document in data:
# count unique words only; one count per document
counts.update(set(document))
演示:
>>> from collections import Counter
>>> data = [['apple', 'banana', 'banana'], ['apple', 'strawberry'], ['banana', 'lemon']]
>>> counts = Counter()
>>> for document in data:
... # count unique words only; one count per document
... counts.update(set(document))
...
>>> for word, documentcount in counts.most_common():
... print word, documentcount
...
apple 2
banana 2
strawberry 1
lemon 1
答案 2 :(得分:1)
使用Counter和itertools可以用一行代码编写它:
from collections import Counter
import itertools
Counter(itertools.chain(*map(set, data)))
结果:
Counter({'apple': 2, 'banana': 2, 'strawberry': 1, 'lemon': 1})
答案 3 :(得分:0)
使用基本功能(set和dict):
res = {}
for lst in data:
for word in set(lst):
if word not in res:
res[word] = 0
res[word] += 1
print res
与代码一样运行O(n log(n))
而不是O(n^2)
。