熊猫:计算当天持续时间总和超过30分钟的天数

时间:2020-05-22 09:15:32

标签: python pandas

以下是示例来源:

ID      Date              Duration
111     2020-01-01        00:42:23
111     2020-01-01        00:23:23
111     2020-01-02        00:37:22
222     2020-01-02        00:13:08
222     2020-01-03        01:52:11
....
999     2020-01-31        00:15:21
999     2020-01-31        00:52:12

我使用Pandas,我想按日期计算每天的持续时间总和,并计算按天计算> 30分钟(按ID分组)的每月总天数

这就是我需要得到的:

ID      Total days when sum of duration by day from each ID > 30 min (per month)
111     2
222     1
.... 
999     5

类似这样:

    aggregation = {
        'num_days': pd.NamedAgg(column="duration", aggfunc=lambda x: x.sum() > dt.timedelta(minutes=30)),
    }
    total_active = df.groupby('Id').agg(**aggregation)

但这根本不是我所需要的...

有人可以帮忙吗?

3 个答案:

答案 0 :(得分:0)

尝试一下,

df['_duration'] = pd.to_datetime(df['Duration'], format="%H:%M:%S").dt.hour

df_g = df.groupby('id')['_duration'].sum().reset_index()

# this should yield greater than 30.
df_g = df_g[df_g['_duration'] > 30]

to_dateime

答案 1 :(得分:0)

print(df)

    ID  Date    Duration
0   111 2020-01-01  00:42:23
1   111 2020-01-01  00:23:23
2   111 2020-01-02  00:37:22
3   222 2020-01-02  00:13:08
4   222 2020-01-03  01:52:11
5   999 2020-01-31  00:15:21
6   999 2020-01-31  00:52:12

使用pd.TimedeltaDuration列的dtype转换为<m8[ns]

df['Duration'] = df.Duration.apply(pd.Timedelta)

,然后使用groupbysum

result = (df.groupby(['ID', "Date"])['Duration'].sum() > "30min").groupby("ID").sum()

输出:

ID
111    2.0
222    1.0
999    1.0

答案 2 :(得分:0)

不确定我们是求和还是算。但是要满足您的输出。

df['Date']=pd.to_datetime(df['Date'])#Coerce Date to datetime
df['Duration']=pd.to_timedelta(df['Duration'], unit='m')#Coerce duration to timedelta
df.set_index(df['Date'], inplace=True)#Set time as index
#Groupby date and id, examine condtiton and sum.
(df.groupby([df.index.date, df.ID])['Duration'].sum()>'30min').groupby('ID').sum()