我有一个数据框:
In [59]: df.head()
Out[59]:
src-subnet dst-subnet bytes Start
0 ATL Internet 180 1477252800
1 ATL ATL 206 1477252800
2 ATL Private-10.0.0.0 17160 1477252800
3 SAT Internet 15768 1477252800
4 AWS-CloudFront ATL 1592 1477252800
对于每个start值,我想要在每个可能的src-subnet
和dst-subnet
对之间传输的字节总和。
我写了以下内容:
g_start = df.groupby('Start')
g_start_subnet = g_start.apply(lambda x: x.groupby(['src-subnet', 'dst-subnet']).sum())
我获得了:
```
In [60]: g_start_subnet
Out[60]:
bytes
Start src-subnet dst-subnet
1476896400 ATL ATL 5190647907
AWS 67442
AWS-CloudFront 3523866
AWS-EC2 10231320
Broadcast-255.255.255.255 550470
COL 25459487
Corvil-CNE 5130
DEN 68247258
DFB 49930518
DFW 109329393
HOU 43601191
IND 97444445
Internal Network 2295849
Internet 72555057
JES 64789642
LAF 30442304
LOU 27372802
MGO 307340
Multicast-224.0.0.0 25339730
NOL 49526023
Private-10.0.0.0 814280925
Private-172.16.0.0 9348122
Private-192.168.0.0 8907521
SAT 43590577
SCI 2473674
WMSApp 74085307
WMSDB 9604726
AWS ATL 78309
AWS-CloudFront ATL 10026686
HOU 9285
... ...
1477497600 Private-10.0.0.0 Internet 90581028
Link-169.254.0.0 276
Private-10.0.0.0 159103776
Private-192.168.0.0 60724723
WMSApp 1100
Private-100.64.0.0 ATL 1304731
Private-192.168.0.0 ATL 77031
Internet 926
Multicast-224.0.0.0 238220
Private-10.0.0.0 19320
Private-192.168.0.0 393982
SAT ATL 18278815
AWS-EC2 28380
Internet 42773386
Private-10.0.0.0 8030664
Private-172.16.0.0 230389
Private-192.168.0.0 5938773
SCI ATL 2055407
Internet 900
Multicast-224.0.0.0 256
Solace-Servers Broadcast-255.255.255.255 153438
WMSApp ATL 95861882
AWS-EC2 360
DFB 3066
DFW 9000
Internet 3330
Multicast-224.0.0.0 31292117
Private-10.0.0.0 1022
WMSDB ATL 8678562
Internet 2160
我怎么知道删除多索引并获得像数据帧这样的矩阵?
最终结果应包含Start, bytes, src-subnet, dst-subnet
答案 0 :(得分:2)
我认为你可以使用:
g_start_subnet = g_start_subnet.reset_index()
但也许更好用:
print (df.groupby(['Start', 'src-subnet', 'dst-subnet'], as_index=False).sum())