使用groupby和count对列表进行排序

时间:2018-07-11 23:09:50

标签: python python-2.7

我正在寻找一种列表列表。我的函数需要返回某种类型的最少活动的一天,如果有平局,则返回总体活动最少的一天。以下是一个可行的解决方案,但我觉得它相当不合常规,因为它需要转换为字典并返回到列表,并正在寻找一种更快的方式来编写此代码。

print get_day(mylist, 'Activity C')应该产生Day 1

print get_day(mylist, 'Activity A')应该产生Day 2

def get_day(l, activity):
    d = {}

    for x in l:
        if x[0] not in d.keys():
            d[x[0]] = []
        d[x[0]].append(x[1])

    d = {k: [v.count(activity), len(v)] for k, v in d.items()}

    l = [[k, v[0], v[1]] for k, v in d.items()]

    l = sorted(l, key=lambda x: (x[1], x[2]))
    return l[0][0]


mylist = [['Day 1', 'Activity A'], ['Day 2', 'Activity A'], ['Day 1', 'Activity A'], ['Day 2', 'Activity C'],
          ['Day 2', 'Activity D']]

2 个答案:

答案 0 :(得分:2)

在不了解预期的输入尺寸和用例的情况下,无法保证此处的速度,但我认为这段代码更具Pythonic性。

from collections import defaultdict, Counter

def get_day_pythonic(lst, activity):
    if not lst:
        return
    # Count of activities by day
    day_act_counts = Counter([d for (d, a) in lst])
    # Activity counts per day
    act_counter = defaultdict(Counter)
    for (d, a) in lst:
        act_counter[a][d] += 1
    # NOTE: if planning to call this multiple times, should precompute day_act_counts and act_counter.
    # Here we sort first by lowest count of activity, then total activity counts, and then day name.
    return sorted([(act_counter[activity][d], day_act_counts[d], d) for d in day_act_counts])[0][-1]

编辑:更快的实现

def get_day(lst, activity):
     if not lst:
         return
     # Count of all activities by day
     day_act_counts = {}
     # Count of interested activity by day
     act_counter = {}
     for (d, a) in lst:
         day_act_counts[d] = day_act_counts.get(d, 0) + 1
         if a != activity:  # don't need exact count for other activities
             continue
         act_counter[d] = act_counter.get(d, 0) + 1
     # Here we take the min first by lowest count of activity, then total activity counts, and then day name.
     return min((act_counter.get(d, 0), day_act_counts[d], d) for d in day_act_counts)[-1]

答案 1 :(得分:1)

首先,我们可以编写用于通过第一个坐标收集对的实用程序:

from collections import defaultdict


def collect(items):
    result = defaultdict(list)
    for key, value in items:
        result[key].append(value)
    return result

之后,我们的get_day函数可以写为

from collections import Counter
from itertools import imap


def get_day(days_activities, target_activity):
    activities_by_days = collect(days_activities)
    days_by_activities = collect(imap(reversed, days_activities))
    days_target_activity_counter = Counter(days_by_activities[target_activity])

    def to_target_and_overall_activities_counts(day):
        return (days_target_activity_counter[day],
                #  if there is a tie
                len(activities_by_days[day]))

    return min(activities_by_days,
               key=to_target_and_overall_activities_counts)

测试

# 'Day 1' has fewest overall activities (3 < 4)
>>> mylist = [['Day 1', 'Activity A'],
              ['Day 1', 'Activity A'],
              ['Day 2', 'Activity A'],
              ['Day 2', 'Activity C'],
              ['Day 1', 'Activity D'],
              ['Day 2', 'Activity D'],
              ['Day 2', 'Activity E']]
>>> get_day(mylist, 'Activity C')
'Day 1'
>>> get_day(mylist, 'Activity A')
'Day 2'
>>> get_day(mylist, 'Activity D')
'Day 1'