Trying to do a analysis of network trace data using pandas. I have read the dump file and created the following DataFrame
:
So to detect the individual flows in the DataFrame
data2
, I have grouped the entire DataFrame
according to ['ip_src', 'ip_dst', 'sport', 'dport', 'ip_proto', 'service']
using the following piece of code:
flow = ['ip_src', 'ip_dst', 'sport', 'dport', 'ip_proto', 'service']
grp1 = data2.groupby(flow, sort=False)
So when I do grp1.size()
of the first twenty rows of data2
, I get the following information:
What I would like to do now is to calculate the mean
of ip_len
, packet_len
, var
of ip_len
, packet_len
and mean
of the interpacket arrival times (using the timestamps
of packets belonging to the same flow).
How can I accomplish this in pandas so that the dataframe I get contains the statistics of each flow i.e. the columns should contain the ip_src
, ip_dst
, sport
, dport
, ip_proto
, service
, and the mean & var values calculated as earlier. I have tried both the aggr
and apply
methods, but haven't been able to do it. Thanks in advance!
答案 0 :(得分:1)
outDegreeOf()
应该做的。