聚合在一个组内

时间:2016-02-20 12:21:36

标签: python pandas group-by dataframe

我想计算ID在给定周内出现的次数。这是我的数据框:

dat = pd.DataFrame({
    'week': ['week_1', 'week_1', 'week_1', 'week_1', 'week_1', 'week_1', 'week_1', 'week_1', 'week_1', 'week_2', 'week_2', 'week_2', 'week_2', 'week_2', 'week_2', 'week_2', 'week_2', 'week_2', 'week_2'],
    'hour': [4, 5, 17, 3, 2, 4, 11, 19, 4, 5, 2, 15, 10, 12, 4, 8, 9, 10, 11],
    'ds': ['2015-05-09', '2015-05-09', '2015-05-09', '2015-05-09',' 2015-05-09', '2015-05-10', '2015-05-10', '2015-05-10', '2015-05-11', '2015-06-17', '2015-06-17', '2015-06-18', '2015-06-18', '2015-06-18', '2015-06-19', '2015-06-19', '2015-06-19', '2015-06-19', '2015-06-20'],
    'id': ['b1', 'b2', 'b3', 'b4', 'b5', 'b6', 'b4', 'b7', 'b2', 'b8', 'b9', 'b1', 'b2', 'b4', 'b4', 'b8', 'b10', 'b1', 'b2']})

>>> dat
        ds        hour   id     week
 0   2015-05-09     4     b1    week_1
 1   2015-05-09     5     b2    week_1
 2   2015-05-09    17     b3    week_1
 3   2015-05-09     3     b4    week_1
 4   2015-05-09     2     b5    week_1
 5   2015-05-10     4     b6    week_1
 6   2015-05-10    11     b4    week_1
 7   2015-05-10    19     b7    week_1
 8   2015-05-11     4     b2    week_1
 9   2015-06-17     5     b8    week_2
10   2015-06-17     2     b9    week_2
11   2015-06-18    15     b1    week_2
12   2015-06-18    10     b2    week_2
13   2015-06-18    12     b4    week_2
14   2015-06-19     4     b4    week_2
15   2015-06-19     8     b8    week_2
16   2015-06-19     9    b10    week_2
17   2015-06-19    10     b1    week_2
18   2015-06-20    11     b2    week_2

我想获得一个看起来像这样的数据框

      week    id  0
 0   week_1   b1  1
 1   week_1   b2  2
 2   week_1   b3  1
 3   week_1   b4  2
 4   week_1   b5  1
 5   week_1   b6  1
 6   week_1   b7  1
 7   week_2   b1  2
 8   week_2  b10  1
 9   week_2   b2  2
10   week_2   b4  2
11   week_2   b8  2
12   week_2   b9  1

以有效的方式

我当前的代码获取了我的结果:

dat2 = pd.DataFrame(dat.groupby(['week', 'ds', 'id']).size())
dat2.reset_index(inplace=True)
dat3=DataFrame(dat2.groupby(['week','id']).size())
dat3.reset_index(inplace=True)

我知道必须有更好的方法。

1 个答案:

答案 0 :(得分:1)

您可以使用这个简单的dat3

获得最终的groupby结果
>>> dat.groupby(['week', 'id'], as_index=False)['id'].count().reset_index()
      week   id  0
0   week_1   b1  1
1   week_1   b2  2
2   week_1   b3  1
3   week_1   b4  2
4   week_1   b5  1
5   week_1   b6  1
6   week_1   b7  1
7   week_2   b1  2
8   week_2  b10  1
9   week_2   b2  2
10  week_2   b4  2
11  week_2   b8  2
12  week_2   b9  1

诀窍是指定as_index=False,以便id功能可以使用count列。