Question

我想使用pandas按小时聚合一些数据并显示日期而不是索引。

我现在的代码如下：

import pandas as pd
import numpy as np

dates = pd.date_range('1/1/2011', periods=20, freq='25min')
data = pd.Series(np.random.randint(100, size=20), index=dates)

result = data.groupby(data.index.hour).sum().reset_index(name='Sum')

print(result)

其中显示的内容如下：

   index  Sum
0      0  131
1      1  116
2      2  180
3      3   62
4      4   95
5      5  107
6      6   89
7      7  169

问题是我想要显示与该小时相关联的日期而不是索引。

我试图实现的结果如下：

       index                Sum
0      2011-01-01 01:00:00  131
1      2011-01-01 02:00:00  116
2      2011-01-01 03:00:00  180
3      2011-01-01 04:00:00   62
4      2011-01-01 05:00:00   95
5      2011-01-01 06:00:00  107
6      2011-01-01 07:00:00   89
7      2011-01-01 08:00:00  169

我有什么方法可以轻松地使用熊猫吗？

Answer 1

data.groupby(data.index.strftime('%Y-%m-%d %H:00:00')).sum().reset_index(name='Sum')

Answer 2

您可以使用resample。

data.resample('H').sum()

输出：

2011-01-01 00:00:00     84
2011-01-01 01:00:00    121
2011-01-01 02:00:00    160
2011-01-01 03:00:00     70
2011-01-01 04:00:00     88
2011-01-01 05:00:00    131
2011-01-01 06:00:00     56
2011-01-01 07:00:00    109
Freq: H, dtype: int32

选项＃2

data.groupby(data.index.floor('H')).sum()

输出：

2011-01-01 00:00:00     84
2011-01-01 01:00:00    121
2011-01-01 02:00:00    160
2011-01-01 03:00:00     70
2011-01-01 04:00:00     88
2011-01-01 05:00:00    131
2011-01-01 06:00:00     56
2011-01-01 07:00:00    109
dtype: int32

Python Pandas：按小时聚合数据并显示它而不是索引

2 个答案: