计算熊猫数据框中时间间隔的出现次数

时间:2021-05-16 10:25:25

标签: python pandas datetime aggregate

我有这个简单的数据框:

 Date and time        Event 
 --------------------------
 2020-03-23 9:05:03    A
 2020-03-23 14:06:02   B
 2020-03-23 9:06:43C   B
 2020-03-23 12:11:50   D
 2020-03-23 12:12:38   D
 2020-03-23 12:13:17   B
 2020-03-23 12:14:07   A
 2020-03-23 12:14:54   A
 2020-04-29 10:37:09   A
 2020-04-29 10:39:13   A
 2020-04-29 11:53:33   A
 2020-04-29 12:04:46   C
 2020-04-30 19:15:29   D
 2020-04-30 16:18:4    B 

我想在 4 小时的时间间隔内计算 Event 中出现的次数并创建一个新的数据框。

我想得到这样的东西:

   10:00-14:00  14:00-18:00  18:00-22:00  22:00-02:00
A       2            1            3             0
B       0            1            1             2
C       1            2            1             1
D       0            0            0             2   

我尝试使用重采样进行聚合,然后我从 Time 中提取了 DateTime,然后应用计数,我还尝试了与 pd.TimeGrouper() 的不同组合,但所有这些都没有'似乎工作。我不知道如何设置那些 4 小时的时间间隔,以便我可以应用聚合。

此时,我已经搜索了所有相关帖子,但找不到解决方案。

任何建议将不胜感激。

2 个答案:

答案 0 :(得分:0)

您可以尝试时间箱:

df['Date and time'] = pd.to_datetime(df['Date and time'])
bins = [10, 14, 18, 20, 24]
labels = ['10:00-14:00','14:00-18:00','18:00-20:00','20:00-24:00']
df['TimeBin'] = pd.cut(df['Date and time'].dt.hour, bins, labels=labels, right=False)
result = df.pivot_table(index= ['Event'], columns=['TimeBin'], aggfunc='count')

答案 1 :(得分:0)

这是使用熊猫.groupby().explode()'.pivot_table()的方法

>>> import pandas as pd
>>> df = pd.DataFrame([i.strip().split('   ') for i in '''  2020-03-23 9:05:03   A
...  2020-03-23 14:06:02   B
...  2020-03-23 9:06:43   B
...  2020-03-23 12:11:50   D
...  2020-03-23 12:12:38   D
...  2020-03-23 12:13:17   B
...  2020-03-23 12:14:07   A
...  2020-03-23 12:14:54   A
...  2020-04-29 10:37:09   A
...  2020-04-29 10:39:13   A
...  2020-04-29 11:53:33   A
...  2020-04-29 12:04:46   C
...  2020-04-30 19:15:29   D
...  2020-04-30 16:18:04   B '''.split('\n')], columns=['Date and time', 'Event'])
>>> df
          Date and time Event
0    2020-03-23 9:05:03     A
1   2020-03-23 14:06:02     B
2    2020-03-23 9:06:43     B
3   2020-03-23 12:11:50     D
4   2020-03-23 12:12:38     D
5   2020-03-23 12:13:17     B
6   2020-03-23 12:14:07     A
7   2020-03-23 12:14:54     A
8   2020-04-29 10:37:09     A
9   2020-04-29 10:39:13     A
10  2020-04-29 11:53:33     A
11  2020-04-29 12:04:46     C
12  2020-04-30 19:15:29     D
13  2020-04-30 16:18:04     B
>>> # convert Date and time column to datetime type
>>> df['Date and time'] = pd.to_datetime(df['Date and time'])
>>> # groupby based on freq 4H
>>> df = df.groupby(pd.Grouper(key='Date and time', freq='4H')).agg(list).explode('Event')
>>> df = df.reset_index().dropna()
>>> # retrieve time value and convert it to time bins
>>> def time_binning(x):
...     return f'{x.time()} - {(x + pd.offsets.DateOffset(hours=3, minutes=59, seconds=59)).time()}'
...
>>> df['time'] = df['Date and time'].apply(time_binning)
>>> # pivot table
>>> df = df.pivot_table(index='Event', columns='time', aggfunc='count', fill_value=0)['Date and time']
>>> df
time   08:00:00 - 11:59:59  12:00:00 - 15:59:59  16:00:00 - 19:59:59
Event
A                        4                    2                    0
B                        1                    2                    1
C                        0                    1                    0
D                        0                    2                    1
相关问题