我有一个这样的清单:
dttrain = [["sunny","hot","high","false","no"],
["sunny","hot","high","true","no"],
["overcast","hot","high","false","yes"]]
我想用最后一个索引计算频率,例如:
sunnny no = 2 , sunny yes = 0, hot no = 2, hot yes = 1.
我尝试了自己的代码:
c = Counter(x for sublist in dttrain for x in sublist)
但是这表明:
Counter({'yes': 9, 'false': 8, 'high': 7, 'normal': 7, 'true': 6, 'mild': 6, 'sunny': 5, 'no': 5, 'rainy': 5, 'hot': 4,'overcast': 4, 'cool': 4})
答案 0 :(得分:2)
这是一些不整洁的代码:
def iterate_key(name, _counter):
"""
Iterate a key in a dict, create key if it doesn't exist.
:param str name: Name of key to iterate.
:param dict[int] _counter: Dictionary storing count data.
:return: _counter after iterating key.
"""
if name not in counter:
_counter[name] = 1
else:
_counter[name] += 1
return _counter
counter = {}
for sub_list in dttrain:
key_name = '{}_{}'.format(sub_list[0], sub_list[-1])
counter = iterate_key(key_name, counter)
key_name = '{}_{}'.format(sub_list[1], sub_list[-1])
counter = iterate_key(key_name, counter)
print(counter)
答案 1 :(得分:1)
使用itertools(product and chain)
和collections.Counter
from itertools import product, chain
from collections import Counter
{' '.join(k):v for k,v in Counter(chain(*[product(i[:2],[i[-1]]) for i in dttrain])).items()}
输出:
{'hot no': 2, 'hot yes': 1, 'overcast yes': 1, 'sunny no': 2}