Question

所以我有一个名为'df'的pandas数据框，我想删除秒，只需要YYYY-MM-DD HH：MM格式的索引。但是，分钟也会被分组，并显示该分钟的平均值。

所以我想转换这个dataFrame

                        value
2015-05-03 00:00:00     61.0
2015-05-03 00:00:10     60.0
2015-05-03 00:00:25     60.0
2015-05-03 00:00:30     61.0
2015-05-03 00:00:45     61.0
2015-05-03 00:01:00     61.0
2015-05-03 00:01:10     60.0
2015-05-03 00:01:25     60.0
2015-05-03 00:01:30     61.0
2015-05-03 00:01:45     61.0
2015-05-03 00:02:00     61.0
2015-05-03 00:02:10     60.0
2015-05-03 00:02:25     60.0
2015-05-03 00:02:40     60.0
2015-05-03 00:02:55     60.0
2015-05-03 00:03:00     59.0
2015-05-03 00:03:15     59.0
2015-05-03 00:03:20     59.0
2015-05-03 00:03:35     59.0
2015-05-03 00:03:40     60.0

进入此dataFrame

                        value
2015-05-03 00:00        60.6
2015-05-03 00:01        60.6
2015-05-03 00:02        60.2
2015-05-03 00:03        59.2

我试过像

这样的代码

df['value'].resample('1Min').mean()

或

df.index.resample('1Min').mean()

但这似乎不起作用。有什么想法吗？

Answer 1

您需要先将索引转换为DatetimeIndex：

df.index = pd.DatetimeIndex(df.index)
#another solution
#df.index = pd.to_datetime(df.index)

print (df['value'].resample('1Min').mean())
#another same solution
#print (df.resample('1Min')['value'].mean())
2015-05-03 00:00:00    60.6
2015-05-03 00:01:00    60.6
2015-05-03 00:02:00    60.2
2015-05-03 00:03:00    59.2
Freq: T, Name: value, dtype: float64

另一种解决方案，将0的索引中的秒数设置为astype：

print (df.groupby([df.index.values.astype('<M8[m]')])['value'].mean())
2015-05-03 00:00:00    60.6
2015-05-03 00:01:00    60.6
2015-05-03 00:02:00    60.2
2015-05-03 00:03:00    59.2
Name: value, dtype: float64

按分钟分组索引和计算平均值

1 个答案: