大熊猫数据框按日期间隔汇总

时间:2020-01-02 20:22:02

标签: pandas sum pandas-groupby

我有一个熊猫数据框,如下所示:

    df:

    ts                          src_ip          dst_ip          dst_p   srs_pkts src_bytes  dst_pkts    dst_bytes
0   2019-12-16 15:59:55.621609  185.156.73.60   x.x.x.x         46011   1        40         0           0
1   2019-12-16 19:59:58.368135  185.156.73.60   x.x.x.x         2010    1        40         0           0
2   2019-12-16 15:59:57.108674  185.176.27.182  y.y.y.y         10549   1        40         0           0
3   2019-12-16 22:00:00.774090  89.248.160.193  y.y.y.y         3084    1        40         0           0
4   2019-12-16 11:59:58.927001  89.248.162.161  z.z.z.z         6072    1        40         0           0

我要实现的目标如下:

1 +按一小时间隔分组(使用“ ts”)。

2 +对于每个一小时的小组,请按以下几列进行分组:(src_ip,dst_ip,dst_p)

3 +使用其余各列的总和(src_pkts,src_bytes,dst_pkts,dst_bytes)进行汇总

4 +最后,将这些摘要合并到一个新的数据框中,如下所示:

new_df:

first_seen_at                    last_seen_at                       src_ip           dst_ip    dst_p    total_src_pkts    total_src_bytes    total_dst_pkts    total_dst_bytes
2019-12-16 15:00:00:00.000000    2019-12-16 16:00:00:00.000000      185.156.73.60    x.x.x.x   46011    2                 80                 0                 0               #summary of one hour for one tuple of (src_ip, dst_ip, dst_p)
.
.
.

非常感谢您为上述操作提供的帮助。

1 个答案:

答案 0 :(得分:2)

使用0.25+大熊猫,请尝试:

df_hour = df.groupby([pd.Grouper(freq='H', key='ts'), 'src_ip', 'dst_ip', 'dst_p']).sum()
df_out = df_hour.reset_index('ts').groupby(level=[0, 1, 2])\
                .agg(first_seen_at=('ts', 'first'),
                     last_seen_at=('ts', 'last'),
                     total_src_pkts=('srs_pkts', 'sum'),
                     total_src_bytes=('src_bytes', 'sum'),
                     total_dst_pkts=('dst_pkts', 'sum'),
                     total_dst_bytes=('dst_bytes', 'sum'))\
                .reset_index()
print(df_out)

输出:

           src_ip   dst_ip  dst_p       first_seen_at        last_seen_at  total_src_pkts  total_src_bytes  total_dst_pkts  total_dst_bytes
0   185.156.73.60  x.x.x.x   2010 2019-12-16 19:00:00 2019-12-16 19:00:00               1               40               0                0
1   185.156.73.60  x.x.x.x  46011 2019-12-16 15:00:00 2019-12-16 15:00:00               1               40               0                0
2  185.176.27.182  y.y.y.y  10549 2019-12-16 15:00:00 2019-12-16 15:00:00               1               40               0                0
3  89.248.160.193  y.y.y.y   3084 2019-12-16 22:00:00 2019-12-16 22:00:00               1               40               0                0
4  89.248.162.161  z.z.z.z   6072 2019-12-16 11:00:00 2019-12-16 11:00:00               1               40               0                0