在熊猫中每小时分组并按项目计数

时间:2018-11-27 05:59:51

标签: python pandas pandas-groupby

我有以下带有“ datetime”对象作为索引的数据集

index                                Item

2016-10-30 09:58:11                 Bread
2016-10-30 10:05:34          Scandinavian
2016-10-30 10:05:34          Scandinavian
2016-10-30 10:07:57         Hot chocolate
2016-10-30 10:07:57                   Jam
2016-10-30 10:07:57               Cookies
2016-10-30 10:19:12                Pastry
2016-10-30 10:19:12                Coffee
2016-10-30 10:19:12                   Tea
2016-10-30 10:20:51                Pastry
2016-10-30 10:20:51                 Bread
2016-10-30 10:21:59                 Bread
2016-10-30 10:21:59                Muffin

对Pandas感到陌生,我对如何对数据框进行分组有些迷惑。我需要两件事:1)每小时的商品计数,例如每小时的“面包”总数

类似以下内容

index           item          count

 2016-10-30 09:00:00   Bread   3
 2016-10-30 10:00:00  Coffee  10
 2016-10-30 11:00:00   Toast   1

然后在24小时的时间范围内总计项目数量

index          item  count

 2016-10-30    Bread  13
 2016-10-30   Coffee  1200
 2016-10-30    Toast  19

大概两个单独的操作?

1 个答案:

答案 0 :(得分:1)

获取DatetimeIndex.floor并按GroupBy.size进行汇总:

print (type(df))
<class 'pandas.core.frame.DataFrame'>

dates = df.rename_axis('Dates').index.floor('H')
df1 = df.groupby([dates,'Item']).size().reset_index(name='count')
print (df1)
                Dates           Item  count
0 2016-10-30 09:00:00          Bread      1
1 2016-10-30 10:00:00          Bread      2
2 2016-10-30 10:00:00         Coffee      1
3 2016-10-30 10:00:00        Cookies      1
4 2016-10-30 10:00:00  Hot chocolate      1
5 2016-10-30 10:00:00            Jam      1
6 2016-10-30 10:00:00         Muffin      1
7 2016-10-30 10:00:00         Pastry      2
8 2016-10-30 10:00:00   Scandinavian      2
9 2016-10-30 10:00:00            Tea      1

dates = df.rename_axis('Dates').index.floor('24H')
df2 = df.groupby([dates,'Item']).size().reset_index(name='count')
print (df2)
       Dates           Item  count
0 2016-10-30          Bread      3
1 2016-10-30         Coffee      1
2 2016-10-30        Cookies      1
3 2016-10-30  Hot chocolate      1
4 2016-10-30            Jam      1
5 2016-10-30         Muffin      1
6 2016-10-30         Pastry      2
7 2016-10-30   Scandinavian      2
8 2016-10-30            Tea      1

如果Series

print (type(s))
<class 'pandas.core.series.Series'>

dates = s.rename_axis('Dates').index.floor('24H')
df2 = s.groupby([dates,s]).size().reset_index(name='count')