我试图找到分割决策树的最佳变量,它需要分组并计算某些值的出现次数。 虚拟数据集是
zipped=[(‘a’, ‘None’), (‘b’, ‘Premium’), (‘c’, ‘Basic’), (‘d’, ‘Basic’), (‘b’, ‘Premium’), (‘e’, ‘None’), (‘e’, ‘Basic’), (‘b’, ‘Premium’), (‘a’, ‘None’), (‘c’, ‘None’), (‘b’, ‘None’), (‘d’, ‘None’), (‘c’, ‘Basic’), (‘a’, ‘None’), (‘b’, ‘Basic’), (‘e’, ‘Basic’)]
所以,我想知道a,b,c,d,e中每个都有多少None,Basic和Premium 我需要它看起来像
{‘a’:[‘None’:3,‘Basic’:0,‘Premium’:0], ‘b’:[‘None’:1,‘Basic’:1,‘Premium’:3],…} .
我也对更好的聚合或数据结构方式持开放态度。 这是我试图做的事情
temp=Counter( x[1] for x in zipped if x[0]=='b')
print(temp)
我得到了
Counter({'Premium': 3, 'None': 1, 'Basic': 1})
答案 0 :(得分:3)
假设您的a
,b
等属于slashdot
,google
:
zipped=[('a', 'None'), ('b', 'Premium'), ('c', 'Basic'), ('d', 'Basic'), ('b', 'Premium'),
('e', 'None'), ('e', 'Basic'), ('b', 'Premium'), ('a', 'None'), ('c', 'None'),
('b', 'None'), ('d', 'None'), ('c', 'Basic'), ('a', 'None'), ('b', 'Basic'),
('e', 'Basic')]
from collections import Counter
d = {}
for key,val in zipped:
d.setdefault(key,[]).append(val) # create key with empty list (if needed) + append val.
# now they are ordered lists, overwrite with Counter of it:
for key in d:
d[key] = Counter(d[key])
print(d)
输出:
{'a': Counter({'None': 3}),
'b': Counter({'Premium': 3, 'None': 1, 'Basic': 1}),
'c': Counter({'Basic': 2, 'None': 1}),
'd': Counter({'Basic': 1, 'None': 1}),
'e': Counter({'Basic': 2, 'None': 1})}
计数器可让您.most_common()
获取所需的列表:
for k in d:
print(k,d[k].most_common())
输出:
a [('None', 3)]
b [('Premium', 3), ('None', 1), ('Basic', 1)]
c [('Basic', 2), ('None', 1)]
d [('Basic', 1), ('None', 1)]
e [('Basic', 2), ('None', 1)]
如果你真的需要0计数,你可以在事后添加它们:
allVals = {v for _,v in zipped} # get distinct values of zipped
for key in d:
for v in allVals:
d[key].update([v]) # add value once
d[key].subtract([v]) # subtract value once
有点麻烦,但这种方式会出现任何问题,如果zipped
for k in d:
print(k,d[k].most_common())
输出:
a [('None', 3), ('Premium', 0), ('Basic', 0)]
b [('Premium', 3), ('None', 1), ('Basic', 1)]
c [('Basic', 2), ('None', 1), ('Premium', 0)]
d [('Basic', 1), ('None', 1), ('Premium', 0)]
e [('Basic', 2), ('None', 1), ('Premium', 0)]
答案 1 :(得分:0)
您可以尝试这样的事情:
data=[('a', 'None'), ('b', 'Premium'), ('c', 'Basic'), ('d', 'Basic'), ('b', 'Premium'),
('e', 'None'), ('e', 'Basic'), ('b', 'Premium'), ('a', 'None'), ('c', 'None'),
('b', 'None'), ('d', 'None'), ('c', 'Basic'), ('a', 'None'), ('b', 'Basic'),
('e', 'Basic')]
manual_dict={}
for i,j in enumerate(data):
if j[0] not in manual_dict:
manual_dict[j[0]]=[j[1]]
else:
manual_dict[j[0]].append(j[1])
final_dict={}
for ia,aj in manual_dict.items():
final_dict[ia]={'None':aj.count('None'),'Basic':aj.count('Basic'),'Premium':aj.count('Premium')}
print(final_dict)
输出:
{'c': {'Premium': 0, 'None': 1, 'Basic': 2}, 'a': {'Premium': 0, 'None': 3, 'Basic': 0}, 'd': {'Premium': 0, 'None': 1, 'Basic': 1}, 'b': {'Premium': 3, 'None': 1, 'Basic': 1}, 'e': {'Premium': 0, 'None': 1, 'Basic': 2}}