我们说我有一个这样的列表:
[(9600002, 42, 3),
(9600001, 17, 3),
(9600003, 11, 1),
(9600002, 14, 5),
(9600001, 17, 1),
(9600003, 11, 4),
(9600001, 17, 4),
(9600001, 14, 3),
(9600002, 42, 6),
(9600002, 42, 1)]
第一个数字是 user_id ,第二个是 tv_program_code ,第三个是 season_id 即可。
如何找到超过 1季 订阅的 program_code ,然后打印 user_id 和 tv_program_code ?例如:
9600001 17
或者您对我应该应用哪种数据结构有任何建议?
答案 0 :(得分:2)
一种方法是使用collections.Counter
。
这个想法是使用字典计算每个(用户,程序)组合的系列数。
然后通过词典理解过滤大于1的计数。
from collections import Counter
lst = [(9600002, 42, 3), (9600001, 17, 3), (9600003, 11, 1),
(9600002, 14, 5), (9600001, 17, 1), (9600003, 11, 4),
(9600001, 17, 4), (9600001, 14, 3), (9600002, 42, 6),
(9600002, 42, 1)]
c = Counter()
for user, program, season in lst:
c[(user, program)] += 1
print(c)
# Counter({(9600002, 42): 3, (9600001, 17): 3, (9600003, 11): 2,
# (9600002, 14): 1, (9600001, 14): 1})
res = {k: v for k, v in c.items() if v > 1}
print(res)
# {(9600002, 42): 3, (9600001, 17): 3, (9600003, 11): 2}
print(res.keys())
# dict_keys([(9600002, 42), (9600001, 17), (9600003, 11)])
关于Counter与defaultdict(int)的注意事项
Counter
的速度是defaultdict(int)
的两倍,请参阅下面的基准测试。如果性能很重要,并且这些功能都不适合您,您可以轻松切换到defaultdict(int)
:
Counter
个密钥don't get added automatically。Counter
个对象。Counter
提供additional methods,例如elements
,most_common
。基于Python 3.6.2的基准测试。
from collections import defaultdict, Counter
lst = lst * 100000
def counter(lst):
c = Counter()
for user, program, season in lst:
c[(user, program)] += 1
return c
def dd(lst):
d = defaultdict(int)
for user, program, season in lst:
d[(user, program)] += 1
return d
%timeit counter(lst) # 900 ms
%timeit dd(lst) # 450 ms
答案 1 :(得分:1)
有很多方法可以完成这项任务
首先使用detaultdict:
import collections
data=[(9600002, 42, 3),
(9600001, 17, 3),
(9600003, 11, 1),
(9600002, 14, 5),
(9600001, 17, 1),
(9600003, 11, 4),
(9600001, 17, 4),
(9600001, 14, 3),
(9600002, 42, 6),
(9600002, 42, 1)]
d=collections.defaultdict(list)
for i in data:
d[(i[0],i[1])].append(i)
print(list(filter(lambda x:len(x)>1,d.values())))
输出:
[[(9600003, 11, 1), (9600003, 11, 4)], [(9600001, 17, 3), (9600001, 17, 1), (9600001, 17, 4)], [(9600002, 42, 3), (9600002, 42, 6), (9600002, 42, 1)]]
第二次使用itertools groupby:
import itertools
print(list(filter(lambda x:len(x)>1,[list(j) for i,j in itertools.groupby(sorted(data),key=lambda x:(x[0],x[1]))])))
输出:
[[(9600001, 17, 1), (9600001, 17, 3), (9600001, 17, 4)], [(9600002, 42, 1), (9600002, 42, 3), (9600002, 42, 6)], [(9600003, 11, 1), (9600003, 11, 4)]]
第三种方法
最后,您还可以尝试手动方法,而不是使用任何导入:
d={}
for i in data:
if (i[0],i[1]) not in d:
d[(i[0],i[1])]=[i]
else:
d[(i[0],i[1])].append(i)
print(list(filter(lambda x:len(x)>1,d.values())))
输出:
[[(9600003, 11, 1), (9600003, 11, 4)], [(9600001, 17, 3), (9600001, 17, 1), (9600001, 17, 4)], [(9600002, 42, 3), (9600002, 42, 6), (9600002, 42, 1)]]