Question

我有一个单词列表，其中约有273000个单词Word_array 大约有17000个唯一单词，它们存储在Word_arrayU

中

我想要每个人的计数

#make bag of worsds   
Word_arrayU = np.unique(Word_array)
wordBag = [['0','0'] for _ in range(len(Word_array))] #prealocate necessary space
i=0
while i< len(Word_arrayU): #for each unique word
    wordBag[i][0] = Word_arrayU[i]
    #I think this is the part that takes a long time.  summing up a list comprehension with a conditional.  Just seems sloppy
    wordBag[i][1]=sum([1 if x == Word_arrayU[i] else 0 for x in Word_array])
    i=i+1

用条件总结列表理解。看起来很草率;有更好的方法吗？

Answer 1

由于您已经在使用numpy.unique，因此只需在唯一的调用中设置 return_counts = True ：

import numpy as np

unique,  count = np.unique(Word_array, return_counts=True)

这将为您提供两个数组，即唯一元素及其数量：

n [10]: arr = [1,3,2,11,3,4,5,2,3,4]

In [11]: unique,  count = np.unique(arr, return_counts=True)

In [12]: unique
Out[12]: array([ 1,  2,  3,  4,  5, 11])

In [13]: count
Out[13]: array([1, 2, 3, 2, 1, 1])

Answer 2

基于@jonrsharpe的建议...

from collections import Counter

words = Counter()

words['foo'] += 1
words['foo'] += 1
words['bar'] += 1

输出

Counter({'bar': 1, 'foo': 2})

这非常方便，因为您不必初始化单词。

您也可以直接从单词列表中进行初始化：

Counter(['foo', 'foo', 'bar'])

输出

Counter({'bar': 1, 'foo': 2})

Answer 3

我不了解大多数＆＃39; Pythonic＆＃39;但绝对最简单的方法是使用collections.Counter。

from collections import Counter

Word_array = ["word1", "word2", "word3", "word1", "word2", "word1"]

wordBag = Counter(Word_array).items()

Answer 4

在python 3中有一个内置的list.count函数。例如：

Word_arrayU = np.unique(Word_array)
wordBag = []
for uniqueWord in Word_arrayU:
    wordBag.append([uniqueWord, Word_array.count(uniqueWord)])

因此，您可以通过执行以下操作来提高效率：

{{1}}

Answer 5

如果您想要效率低于（Counter）但更透明的解决方案，则可以使用collections.defaultdict

from collections import defaultdict
my_counter = defaultdict(int)
for word in word_array:
    my_counter[word] += 1

“pythonic”的方式来填补一句话

6 个答案: