您好我希望能够通过嵌套列表的索引计算列表中项目的出现次数。
如果我的清单是
keys = ['One', 'Two', 'Three', 'Four', 'Five', 'Six', 'Seven', 'Eight',
'Nine', 'Ten', 'Eleven', 'Twelve', 'Thirteen', 'Fourteen', 'Fifteen']
我的嵌套列表如下:
[['Three' 'One' 'Ten']
['Three' 'Five' 'Nine']
['Two' 'Five' 'Three']
['Two' 'Three' 'Eight']
['One' 'Three' 'Nine']]
每个项目在索引0等处出现'One'的次数是我想知道的。
我正在使用numpy数组来构建列表并从加权随机创建输出。我希望能够对1000个列表运行测试并计算索引的出现次数,以确定我在程序中其他地方所做的更改如何影响最终结果。
我找到了https://stackoverflow.com/a/10741692/461887
等示例import numpy as np
x = np.array([1,1,1,2,2,2,5,25,1,1])
y = np.bincount(x)
ii = np.nonzero(y)[0]
zip(ii,y[ii])
# [(1, 5), (2, 3), (5, 1), (25, 1)]
但这似乎不适用于嵌套列表。同时也在numpy cookbook - indexing和直方图&在example list中进行数字化,但我似乎找不到能够做到这一点的功能。
更新以包含示例数据输出:
确定100个深层嵌套列表
{'One': 19, 'Two': 16, 'Three': 19, 'Four': 11, 'Five': 7, 'Six': 8, 'Seven' 4, 'Eight' 3,
'Nine' 5, 'Ten': 1, 'Eleven': 2, 'Twelve': 1, 'Thirteen': 1, 'Fourteen': 3, 'Fifteen': 0}
或者像在treddy的例子中那样
array([19, 16, 19, 11, 7, 8, 4, 3, 5, 1, 2, 1, 1, 3, 0])
答案 0 :(得分:2)
您最好添加您想要获得的示例输出,但现在看起来像collections.Counter将完成这项工作:
>>> data = [['Three','One','Ten'],
... ['Three','Five','Nine'],
... ['Two','Five','Three'],
... ['Two','Three','Eight'],
... ['One','Three','Nine']]
...
>>>
>>> from collections import Counter
>>> [Counter(x) for x in data]
[Counter({'Three': 1, 'Ten': 1, 'One': 1}), Counter({'Nine': 1, 'Five': 1, 'Three': 1}), Counter({'Five': 1, 'Two': 1, 'Three': 1}), Counter({'Eight': 1, 'Two': 1, 'Three': 1}), Counter({'Nine': 1, 'Three': 1, 'One': 1})]
当你提供所需的输出时,我认为你的想法是 - 加粗列表,使用Counter来计算出现次数,然后创建字典(或OrderedDict,如果顺序对你很重要):
>>> from collections import Counter, OrderedDict
>>> c = Counter(e for l in data for e in l)
>>> c
Counter({'Three': 5, 'Two': 2, 'Nine': 2, 'Five': 2, 'One': 2, 'Ten': 1, 'Eight': 1})
或者如果您只需要在每个列表中输入第一个条目:
>>> c = Counter(l[0] for l in data)
>>> c
Counter({'Three': 2, 'Two': 2, 'One': 1})
简单字典:
>>> {x:c[x] for x in keys}
{
'Twelve': 0, 'Seven': 0,
'Ten': 1, 'Fourteen': 0,
'Nine': 2, 'Six': 0
'Three': 5, 'Two': 2,
'Four': 0, 'Eleven': 0,
'Five': 2, 'Thirteen': 0,
'Eight': 1, 'One': 2, 'Fifteen': 0
}
或OrderedDict:
>>> OrderedDict((x, c[x]) for x in keys)
OrderedDict([('One', 2), ('Two', 2), ('Three', 5), ('Four', 0), ('Five', 2), ('Six', 0), ('Seven', 0), ('Eight', 1), ('Nine', 2), ('Ten', 1), ('Eleven', 0), ('Twelve', 0), ('Thirteen', 0), ('Fourteen', 0), ('Fifteen', 0)])
而且,以防万一,如果你不需要在你的输出中使用零,你可以使用Counter来获得出现次数:
>>> c['Nine'] # Key is in the Counter, returns number of occurences
2
>>> c['Four'] # Key is not in the Counter, returns 0
0
答案 1 :(得分:2)
In [1]: # from original posting:
In [2]: keys = ['One', 'Two', 'Three', 'Four', 'Five', 'Six', 'Seven', 'Eight',
...: 'Nine', 'Ten', 'Eleven', 'Twelve', 'Thirteen', 'Fourteen', 'Fifteen']
In [3]: data = [['Three', 'One', 'Ten'],
...: ['Three', 'Five', 'Nine'],
...: ['Two', 'Five', 'Three'],
...: ['Two', 'Three', 'Eight'],
...: ['One', 'Three', 'Nine']]
In [4]: # make it numpy
In [5]: import numpy as np
In [6]: keys = np.array(keys)
In [7]: data = np.array(data)
In [8]: # if you only want counts for column 0
In [9]: counts = np.sum(keys == data[:,[0]], axis=0)
In [10]: # view it
In [11]: zip(keys, counts)
Out[11]:
[('One', 1),
('Two', 2),
('Three', 2), ...
In [12]: # if you wanted counts for all columns (newaxis here sets-up 3D broadcasting)
In [13]: counts = np.sum(keys[:,np.newaxis,np.newaxis] == data, axis=1)
In [14]: # view it (you could use zip without pandas, this is just for looks)
In [15]: import pandas as pd
In [16]: pd.DataFrame(counts, index=keys)
Out[16]:
0 1 2
One 1 1 0
Two 2 0 0
Three 2 2 1
Four 0 0 0
Five 0 2 0 ...
答案 2 :(得分:1)
numpy.bincount接受一个类似于数组的对象是正确的,因此不能直接使用嵌套列表或具有多个维度的数组,但是您可以简单地使用numpy数组切片来选择第一列您的2D数组和bin计算该列中值范围内每个数字的出现次数:
keys = numpy.arange(1,16) #don't really need to use this
two_dim_array_for_counting = numpy.array([[3,1,10],\
[3,5,9],\
[2,5,3],\
[2,3,8],\
[1,3,9]])
numpy.bincount(two_dim_array_for_counting[...,0]) #only count all rows in the first column
Out[36]: array([0, 1, 2, 2]) #this output means that the digit 0 occurs 0 times, 1 occurs once, 2 occurs twice, and three occurs twice
第一列中不会出现大于3的数字,因此输出数组只有4个元素,计算第一列中出现0,1,2,3位数。