如何计算Python

时间:2018-04-08 14:23:39

标签: python list

我们说我有一个这样的列表:

[(9600002, 42, 3),
(9600001, 17, 3),
(9600003, 11, 1),
(9600002, 14, 5),
(9600001, 17, 1),
(9600003, 11, 4),
(9600001, 17, 4),
(9600001, 14, 3),
(9600002, 42, 6),
(9600002, 42, 1)] 

第一个数字是 user_id ,第二个是 tv_program_code ,第三个是 season_id 即可。

我的问题

如何找到超过 1季 订阅的 program_code ,然后打印 user_id tv_program_code ?例如:

9600001 17

或者您对我应该应用哪种数据结构有任何建议?

2 个答案:

答案 0 :(得分:2)

一种方法是使用collections.Counter

这个想法是使用字典计算每个(用户,程序)组合的系列数。

然后通过词典理解过滤大于1的计数。

from collections import Counter

lst = [(9600002, 42, 3), (9600001, 17, 3), (9600003, 11, 1),
       (9600002, 14, 5), (9600001, 17, 1), (9600003, 11, 4),
       (9600001, 17, 4), (9600001, 14, 3), (9600002, 42, 6),
       (9600002, 42, 1)] 

c = Counter()

for user, program, season in lst:
    c[(user, program)] += 1

print(c)

# Counter({(9600002, 42): 3, (9600001, 17): 3, (9600003, 11): 2,
#          (9600002, 14): 1, (9600001, 14): 1})

res = {k: v for k, v in c.items() if v > 1}

print(res)

# {(9600002, 42): 3, (9600001, 17): 3, (9600003, 11): 2}

print(res.keys())

# dict_keys([(9600002, 42), (9600001, 17), (9600003, 11)])

关于Counter与defaultdict(int)的注意事项

Counter的速度是defaultdict(int)的两倍,请参阅下面的基准测试。如果性能很重要,并且这些功能都不适合您,您可以轻松切换到defaultdict(int)

  1. 查询时遗失Counter个密钥don't get added automatically
  2. 您可以添加/减去Counter个对象。
  3. Counter提供additional methods,例如elementsmost_common
  4. 基于Python 3.6.2的基准测试。

    from collections import defaultdict, Counter
    
    lst = lst * 100000
    
    def counter(lst):
        c = Counter()
        for user, program, season in lst:
            c[(user, program)] += 1
        return c
    
    def dd(lst):
        d = defaultdict(int)
        for user, program, season in lst:
            d[(user, program)] += 1
        return d
    
    %timeit counter(lst)  # 900 ms
    %timeit dd(lst)       # 450 ms
    

答案 1 :(得分:1)

有很多方法可以完成这项任务

  

首先使用detaultdict:

import collections
data=[(9600002, 42, 3),
(9600001, 17, 3),
(9600003, 11, 1),
(9600002, 14, 5),
(9600001, 17, 1),
(9600003, 11, 4),
(9600001, 17, 4),
(9600001, 14, 3),
(9600002, 42, 6),
(9600002, 42, 1)]

d=collections.defaultdict(list)

for i in data:
    d[(i[0],i[1])].append(i)

print(list(filter(lambda x:len(x)>1,d.values())))

输出:

[[(9600003, 11, 1), (9600003, 11, 4)], [(9600001, 17, 3), (9600001, 17, 1), (9600001, 17, 4)], [(9600002, 42, 3), (9600002, 42, 6), (9600002, 42, 1)]]
  

第二次使用itertools groupby:

import itertools

print(list(filter(lambda x:len(x)>1,[list(j) for i,j in itertools.groupby(sorted(data),key=lambda x:(x[0],x[1]))])))

输出:

[[(9600001, 17, 1), (9600001, 17, 3), (9600001, 17, 4)], [(9600002, 42, 1), (9600002, 42, 3), (9600002, 42, 6)], [(9600003, 11, 1), (9600003, 11, 4)]]
  

第三种方法

最后,您还可以尝试手动方法,而不是使用任何导入:

d={}

for i in data:
    if (i[0],i[1]) not in d:
        d[(i[0],i[1])]=[i]
    else:
        d[(i[0],i[1])].append(i)

print(list(filter(lambda x:len(x)>1,d.values())))

输出:

[[(9600003, 11, 1), (9600003, 11, 4)], [(9600001, 17, 3), (9600001, 17, 1), (9600001, 17, 4)], [(9600002, 42, 3), (9600002, 42, 6), (9600002, 42, 1)]]