问题：

我有一些关联组的定时事件，可以通过以下方式生成：

from numpy.random import randint, seed
import pandas as pd

seed(42)    # reproducibility

samp_N = 1000
# create times within 3 hours, and 15 random groups
df = pd.DataFrame({'time': randint(0,3*60*60, samp_N), 
                   'group': randint(0, 15, samp_N)})
# make a resample-able index from the seconds time values
df.set_index(pd.TimedeltaIndex(df.time, 's'), inplace=True)

看起来像：

          group   time
02:01:10     10   7270
00:14:20     13    860
01:29:50      9   5390
01:26:31     13   5191
...

当我尝试重新采样事件时，我得到了一些不受欢迎的东西

df.resample('5T').count()

          group  time
00:00:04     28    28
00:05:04     18    18
00:10:04     32    32
...

不幸的是，重采样周期从任意（数据中的第一个）偏移值开始。如果我将其分组（最终需要），那就更烦人了

df.groupby('group').resample('5T').count()

然后我为每个组获得一个新的偏移量我想要的是采样窗口的精确开始：

00:00:00   5 ...
00:05:00  17 ...
00:10:00  11 ...
...

有一个建议：https://stackoverflow.com/a/23966229

df.groupby(pd.TimeGrouper('5Min')).count()

但它也不起作用，因为它也破坏了上面所要求的分组。

感谢提示！

Answer 1

不幸的是，我没有提出一个很好的解决方案，而是一个解决方案。我添加了一个时间值为零的虚拟行，然后按时间和组分组：

df = pd.Series({'time':0,'group':-1}).to_frame().T.set_index(pd.TimedeltaIndex([0], 's')).append(df)
df = df.groupby([pd.Grouper(freq='5Min'), 'group']).count().reset_index('group')
df = df.loc[df['group']!=-1]
df.head()
        group  time
0 days      0     2
0 days      1     4
0 days      2     3
0 days      3     1
0 days      4     2

Answer 2

我不确定这是你想要的结果：

result = df.groupby(['group', pd.Grouper(freq='5Min')]).count().reset_index(level=0)
result.head()
>>>        group  time
00:05:00      0     2
00:10:00      0     1
00:15:00      0     3
00:20:00      0     2
00:30:00      0     1
result.sort_index().head()
>>>       group  time
0 days     10     1
0 days     14     3
0 days      2     1
0 days     13     1
0 days      4     3

pandas将DataFrame中的定时事件重新采样到精确的时间段

问题：

2 个答案: