Question

我们说我有一个这样的列表：

[(9600002, 42, 3),
(9600001, 17, 3),
(9600003, 11, 1),
(9600002, 14, 5),
(9600001, 17, 1),
(9600003, 11, 4),
(9600001, 17, 4),
(9600001, 14, 3),
(9600002, 42, 6),
(9600002, 42, 1)]

第一个数字是 user_id ，第二个是 tv_program_code ，第三个是 season_id 即可。

我的问题

如何找到超过 1季订阅的 program_code ，然后打印 user_id 和 tv_program_code ？例如：

9600001 17

或者您对我应该应用哪种数据结构有任何建议？

Answer 1

一种方法是使用collections.Counter。

这个想法是使用字典计算每个（用户，程序）组合的系列数。

然后通过词典理解过滤大于1的计数。

from collections import Counter

lst = [(9600002, 42, 3), (9600001, 17, 3), (9600003, 11, 1),
       (9600002, 14, 5), (9600001, 17, 1), (9600003, 11, 4),
       (9600001, 17, 4), (9600001, 14, 3), (9600002, 42, 6),
       (9600002, 42, 1)] 

c = Counter()

for user, program, season in lst:
    c[(user, program)] += 1

print(c)

# Counter({(9600002, 42): 3, (9600001, 17): 3, (9600003, 11): 2,
#          (9600002, 14): 1, (9600001, 14): 1})

res = {k: v for k, v in c.items() if v > 1}

print(res)

# {(9600002, 42): 3, (9600001, 17): 3, (9600003, 11): 2}

print(res.keys())

# dict_keys([(9600002, 42), (9600001, 17), (9600003, 11)])

关于Counter与defaultdict（int）的注意事项

Counter的速度是defaultdict(int)的两倍，请参阅下面的基准测试。如果性能很重要，并且这些功能都不适合您，您可以轻松切换到defaultdict(int)：

查询时遗失Counter个密钥don't get added automatically。
您可以添加/减去Counter个对象。
Counter提供additional methods，例如elements，most_common。

基于Python 3.6.2的基准测试。

from collections import defaultdict, Counter

lst = lst * 100000

def counter(lst):
    c = Counter()
    for user, program, season in lst:
        c[(user, program)] += 1
    return c

def dd(lst):
    d = defaultdict(int)
    for user, program, season in lst:
        d[(user, program)] += 1
    return d

%timeit counter(lst)  # 900 ms
%timeit dd(lst)       # 450 ms

Answer 2

有很多方法可以完成这项任务

首先使用detaultdict：

import collections
data=[(9600002, 42, 3),
(9600001, 17, 3),
(9600003, 11, 1),
(9600002, 14, 5),
(9600001, 17, 1),
(9600003, 11, 4),
(9600001, 17, 4),
(9600001, 14, 3),
(9600002, 42, 6),
(9600002, 42, 1)]

d=collections.defaultdict(list)

for i in data:
    d[(i[0],i[1])].append(i)

print(list(filter(lambda x:len(x)>1,d.values())))

输出：

[[(9600003, 11, 1), (9600003, 11, 4)], [(9600001, 17, 3), (9600001, 17, 1), (9600001, 17, 4)], [(9600002, 42, 3), (9600002, 42, 6), (9600002, 42, 1)]]

第二次使用itertools groupby：

import itertools

print(list(filter(lambda x:len(x)>1,[list(j) for i,j in itertools.groupby(sorted(data),key=lambda x:(x[0],x[1]))])))

输出：

[[(9600001, 17, 1), (9600001, 17, 3), (9600001, 17, 4)], [(9600002, 42, 1), (9600002, 42, 3), (9600002, 42, 6)], [(9600003, 11, 1), (9600003, 11, 4)]]

第三种方法

最后，您还可以尝试手动方法，而不是使用任何导入：

d={}

for i in data:
    if (i[0],i[1]) not in d:
        d[(i[0],i[1])]=[i]
    else:
        d[(i[0],i[1])].append(i)

print(list(filter(lambda x:len(x)>1,d.values())))

输出：

[[(9600003, 11, 1), (9600003, 11, 4)], [(9600001, 17, 3), (9600001, 17, 1), (9600001, 17, 4)], [(9600002, 42, 3), (9600002, 42, 6), (9600002, 42, 1)]]

如何计算Python

我的问题

2 个答案: