Question

Trying to do a analysis of network trace data using pandas. I have read the dump file and created the following DataFrame:

So to detect the individual flows in the DataFrame data2, I have grouped the entire DataFrame according to ['ip_src', 'ip_dst', 'sport', 'dport', 'ip_proto', 'service'] using the following piece of code:

flow = ['ip_src', 'ip_dst', 'sport', 'dport', 'ip_proto', 'service']
grp1 = data2.groupby(flow, sort=False)

So when I do grp1.size() of the first twenty rows of data2, I get the following information:

What I would like to do now is to calculate the mean of ip_len, packet_len, var of ip_len, packet_len and mean of the interpacket arrival times (using the timestamps of packets belonging to the same flow).

How can I accomplish this in pandas so that the dataframe I get contains the statistics of each flow i.e. the columns should contain the ip_src, ip_dst, sport, dport, ip_proto, service, and the mean & var values calculated as earlier. I have tried both the aggr and apply methods, but haven't been able to do it. Thanks in advance!

Answer 1

outDegreeOf()

应该做的。

Pandas: Calculate mean, var of similar columns grouped together

1 个答案: