numpy - 如何按索引计算嵌套列表中项目的出现次数?

时间:2013-11-23 04:35:13

标签: python arrays list numpy

您好我希望能够通过嵌套列表的索引计算列表中项目的出现次数。

如果我的清单是

keys = ['One', 'Two', 'Three', 'Four', 'Five', 'Six', 'Seven', 'Eight',
        'Nine', 'Ten', 'Eleven', 'Twelve', 'Thirteen', 'Fourteen', 'Fifteen']

我的嵌套列表如下:

[['Three' 'One' 'Ten']
 ['Three' 'Five' 'Nine']
 ['Two' 'Five' 'Three']
 ['Two' 'Three' 'Eight']
 ['One' 'Three' 'Nine']]

每个项目在索引0等处出现'One'的次数是我想知道的。

我正在使用numpy数组来构建列表并从加权随机创建输出。我希望能够对1000个列表运行测试并计算索引的出现次数,以确定我在程序中其他地方所做的更改如何影响最终结果。

我找到了https://stackoverflow.com/a/10741692/461887

等示例
import numpy as np
x = np.array([1,1,1,2,2,2,5,25,1,1])
y = np.bincount(x)
ii = np.nonzero(y)[0]
zip(ii,y[ii]) 
# [(1, 5), (2, 3), (5, 1), (25, 1)]

但这似乎不适用于嵌套列表。同时也在numpy cookbook - indexing和直方图&在example list中进行数字化,但我似乎找不到能够做到这一点的功能。

更新以包含示例数据输出:

确定100个深层嵌套列表

{'One': 19, 'Two': 16, 'Three': 19, 'Four': 11, 'Five': 7, 'Six': 8, 'Seven' 4, 'Eight' 3,
            'Nine' 5, 'Ten': 1, 'Eleven': 2, 'Twelve': 1, 'Thirteen': 1, 'Fourteen': 3, 'Fifteen': 0}

或者像在treddy的例子中那样

array([19, 16, 19, 11, 7, 8, 4, 3, 5, 1, 2, 1, 1, 3, 0])

3 个答案:

答案 0 :(得分:2)

您最好添加您想要获得的示例输出,但现在看起来像collections.Counter将完成这项工作:

>>> data = [['Three','One','Ten'],
...  ['Three','Five','Nine'],
...  ['Two','Five','Three'],
...  ['Two','Three','Eight'],
...  ['One','Three','Nine']]
... 
>>> 
>>> from collections import Counter
>>> [Counter(x) for x in data]
[Counter({'Three': 1, 'Ten': 1, 'One': 1}), Counter({'Nine': 1, 'Five': 1, 'Three': 1}), Counter({'Five': 1, 'Two': 1, 'Three': 1}), Counter({'Eight': 1, 'Two': 1, 'Three': 1}), Counter({'Nine': 1, 'Three': 1, 'One': 1})]

更新

当你提供所需的输出时,我认为你的想法是 - 加粗列表,使用Counter来计算出现次数,然后创建字典(或OrderedDict,如果顺序对你很重要):

>>> from collections import Counter, OrderedDict
>>> c = Counter(e for l in data for e in l)
>>> c
Counter({'Three': 5, 'Two': 2, 'Nine': 2, 'Five': 2, 'One': 2, 'Ten': 1, 'Eight': 1})

或者如果您只需要在每个列表中输入第一个条目:

>>> c = Counter(l[0] for l in data)
>>> c
Counter({'Three': 2, 'Two': 2, 'One': 1})

简单字典:

>>> {x:c[x] for x in keys} 
{
    'Twelve': 0, 'Seven': 0,
    'Ten': 1, 'Fourteen': 0,
    'Nine': 2, 'Six': 0
    'Three': 5, 'Two': 2,
    'Four': 0, 'Eleven': 0,
    'Five': 2, 'Thirteen': 0,
    'Eight': 1, 'One': 2, 'Fifteen': 0
}

或OrderedDict:

>>> OrderedDict((x, c[x]) for x in keys)
OrderedDict([('One', 2), ('Two', 2), ('Three', 5), ('Four', 0), ('Five', 2), ('Six', 0), ('Seven', 0), ('Eight', 1), ('Nine', 2), ('Ten', 1), ('Eleven', 0), ('Twelve', 0), ('Thirteen', 0), ('Fourteen', 0), ('Fifteen', 0)])

而且,以防万一,如果你不需要在你的输出中使用零,你可以使用Counter来获得出现次数:

>>> c['Nine']   # Key is in the Counter, returns number of occurences
2
>>> c['Four']   # Key is not in the Counter, returns 0
0

答案 1 :(得分:2)

OP问了一个numpy问题,收集Counter和OrderDict肯定会有效,但这是一个愚蠢的答案:

In [1]: # from original posting:
In [2]: keys = ['One', 'Two', 'Three', 'Four', 'Five', 'Six', 'Seven', 'Eight',
...:         'Nine', 'Ten', 'Eleven', 'Twelve', 'Thirteen', 'Fourteen', 'Fifteen']
In [3]: data = [['Three', 'One', 'Ten'],
...:            ['Three', 'Five', 'Nine'],
...:            ['Two', 'Five', 'Three'],
...:            ['Two', 'Three', 'Eight'],
...:            ['One', 'Three', 'Nine']]
In [4]: # make it numpy
In [5]: import numpy as np
In [6]: keys = np.array(keys)
In [7]: data = np.array(data)
In [8]: # if you only want counts for column 0
In [9]: counts = np.sum(keys == data[:,[0]], axis=0)
In [10]: # view it
In [11]: zip(keys, counts)
Out[11]:
[('One', 1),
('Two', 2),
('Three', 2), ...
In [12]: # if you wanted counts for all columns (newaxis here sets-up 3D broadcasting)
In [13]: counts = np.sum(keys[:,np.newaxis,np.newaxis] == data, axis=1)
In [14]: # view it (you could use zip without pandas, this is just for looks)
In [15]: import pandas as pd
In [16]: pd.DataFrame(counts, index=keys)
Out[16]:
          0  1  2
One       1  1  0
Two       2  0  0
Three     2  2  1
Four      0  0  0
Five      0  2  0 ...

答案 2 :(得分:1)

numpy.bincount接受一个类似于数组的对象是正确的,因此不能直接使用嵌套列表或具有多个维度的数组,但是您可以简单地使用numpy数组切片来选择第一列您的2D数组和bin计算该列中值范围内每个数字的出现次数:

keys = numpy.arange(1,16) #don't really need to use this
two_dim_array_for_counting = numpy.array([[3,1,10],\
                                      [3,5,9],\
                                      [2,5,3],\
                                      [2,3,8],\
                                      [1,3,9]])
numpy.bincount(two_dim_array_for_counting[...,0]) #only count all rows in the first column
Out[36]: array([0, 1, 2, 2]) #this output means that the digit 0 occurs 0 times, 1 occurs once, 2 occurs twice, and three occurs twice

第一列中不会出现大于3的数字,因此输出数组只有4个元素,计算第一列中出现0,1,2,3位数。