我有以下带有“ datetime”对象作为索引的数据集
index Item
2016-10-30 09:58:11 Bread
2016-10-30 10:05:34 Scandinavian
2016-10-30 10:05:34 Scandinavian
2016-10-30 10:07:57 Hot chocolate
2016-10-30 10:07:57 Jam
2016-10-30 10:07:57 Cookies
2016-10-30 10:19:12 Pastry
2016-10-30 10:19:12 Coffee
2016-10-30 10:19:12 Tea
2016-10-30 10:20:51 Pastry
2016-10-30 10:20:51 Bread
2016-10-30 10:21:59 Bread
2016-10-30 10:21:59 Muffin
对Pandas感到陌生,我对如何对数据框进行分组有些迷惑。我需要两件事:1)每小时的商品计数,例如每小时的“面包”总数
类似以下内容
index item count
2016-10-30 09:00:00 Bread 3
2016-10-30 10:00:00 Coffee 10
2016-10-30 11:00:00 Toast 1
然后在24小时的时间范围内总计项目数量
index item count
2016-10-30 Bread 13
2016-10-30 Coffee 1200
2016-10-30 Toast 19
大概两个单独的操作?
答案 0 :(得分:1)
获取DatetimeIndex.floor
并按GroupBy.size
进行汇总:
print (type(df))
<class 'pandas.core.frame.DataFrame'>
dates = df.rename_axis('Dates').index.floor('H')
df1 = df.groupby([dates,'Item']).size().reset_index(name='count')
print (df1)
Dates Item count
0 2016-10-30 09:00:00 Bread 1
1 2016-10-30 10:00:00 Bread 2
2 2016-10-30 10:00:00 Coffee 1
3 2016-10-30 10:00:00 Cookies 1
4 2016-10-30 10:00:00 Hot chocolate 1
5 2016-10-30 10:00:00 Jam 1
6 2016-10-30 10:00:00 Muffin 1
7 2016-10-30 10:00:00 Pastry 2
8 2016-10-30 10:00:00 Scandinavian 2
9 2016-10-30 10:00:00 Tea 1
dates = df.rename_axis('Dates').index.floor('24H')
df2 = df.groupby([dates,'Item']).size().reset_index(name='count')
print (df2)
Dates Item count
0 2016-10-30 Bread 3
1 2016-10-30 Coffee 1
2 2016-10-30 Cookies 1
3 2016-10-30 Hot chocolate 1
4 2016-10-30 Jam 1
5 2016-10-30 Muffin 1
6 2016-10-30 Pastry 2
7 2016-10-30 Scandinavian 2
8 2016-10-30 Tea 1
如果Series
:
print (type(s))
<class 'pandas.core.series.Series'>
dates = s.rename_axis('Dates').index.floor('24H')
df2 = s.groupby([dates,s]).size().reset_index(name='count')