使用Pandas根据该行中的值从csv文件中删除特定行之前的行

时间:2019-03-22 12:52:22

标签: python linux pandas csv

我有一个csv文件,如下所示。我想删除一个行值之前的所有行[Station Mac,第一次看到,最后一次看到,电源,数据包,BSSID,Probed ESSID]进行进一步处理。我在python中使用panadad libarary来读取此csv文件。我可以按索引删除特定的行,但是可以在更改几秒钟后重新加载文件。我应该怎么做。 您的帮助将不胜感激。

BSSID, First time seen, Last time seen, channel, Speed, Privacy, Cipher, Authentication, Power, # beacons, # IV, LAN IP, ID-length, ESSID, Key
52:62:00:00:03:01, 2018-06-22 11:23:45, 2018-06-22 11:23:45,  9,  -1, , ,   ,  -1,        0,        0,   0.  0.  0.  0,   0, , 
14:30:04:B2:F5:42, 2018-06-22 11:24:04, 2018-06-22 11:24:04, 11,  -1, WPA, ,   , -88,        0,        1,   0.  0.  0.  0,   0, , 
14:30:04:D6C:95:62, 2018-06-22 11:23:50, 2018-06-22 11:24:08,  6,  -1, WPA, ,   , -85,        0,        2,   0.  0.  0.  0,   0, , 
DC:09:4C:BF:6B:13, 2018-06-22 11:23:58, 2018-06-22 11:24:06,  7,  54, WPA2, CCMP, PSK, -75,        2,        0,   0.  0.  0.  0,  12, Death Stroke, 
B4:FB:N4:97:F8:03, 2018-06-22 11:23:46, 2018-06-22 11:24:12,  6,  54, WPA2, CCMP, PSK, -74,        6,        6,   0.  0.  0.  0,   3, CSE, 
C4:A8:1D:9K:B9:E8, 2018-06-22 11:23:57, 2018-06-22 11:24:12, 11,  22, WPA2 WPA, CCMP TKIP, PSK, -71,        1,        1,   0.  0.  0.  0,  20, SE-IX (Faculty Only), 
78:8A:90:81:C1:31, 2018-06-22 11:23:54, 2018-06-22 11:24:06,  6,  54, WPA2, CCMP, PSK, -71,        4,        0,   0.  0.  0.  0,   3, CSE, 
78:8A:20:49:^9:D1, 2018-06-22 11:23:44, 2018-06-22 11:24:12, 11,  54, WPA2, CCMP, PSK, -41,       58,       21,   0.  0.  0.  0,   3, CSE, 
14:30:04:B3:FD:A2, 2018-06-22 11:23:46, 2018-06-22 11:24:12,  6,  -1, , ,   ,  -1,        0,        0,   0.  0.  0.  0,   0, , 
14:30:KL:B3:52:22, 2018-06-22 11:23:47, 2018-06-22 11:24:12,  2,  -1, WPA, ,   ,  -1,        0,       50,   0.  0.  0.  0,   0, , 
14:30:04:LC:9B:E2, 2018-06-22 11:23:48, 2018-06-22 11:24:01,  3,  -1, , ,   ,  -1,        0,        0,   0.  0.  0.  0,   0, , 
14:U0:04:B3:52:62, 2018-06-22 11:23:49, 2018-06-22 11:24:12, 11,  -1, WPA, ,   ,  -1,        0,       92,   0.  0.  0.  0,   0, , 

Station MAC, First time seen, Last time seen, Power, # packets, BSSID, Probed ESSIDs
macaddrees, 2018-06-22 11:23:45, 2018-06-22 11:23:45, -78,        8, 52:62:90:00:03:01,
macaddress, 2018-06-22 11:23:46, 2018-06-22 11:24:05, -73,        4, 14:30:04:BB:19:A2,
macaddress, 2018-06-22 11:23:52, 2018-06-22 11:24:12, -73,        5, (not associated) ,
macaddress, 2018-06-22 11:23:43, 2018-06-22 11:24:12, -71,        9, not assocaited,
macadress, 2018-06-22 11:23:52, 2018-06-22 11:23:52, -70,        2, (not associated) ,
macaddress, 2018-06-22 11:23:48, 2018-06-22 11:24:01, -69,       11, NAN,
macaddress, 2018-06-22 11:23:46, 2018-06-22 11:24:12, -65,       15, NAN,
macaddress, 2018-06-22 11:24:12, 2018-06-22 11:24:12, -62,        2, (not associated) ,
macaddress, 2018-06-22 11:24:01, 2018-06-22 11:24:12, -54,        2, NAN,
macaddress, 2018-06-22 11:23:49, 2018-06-22 11:24:12, -48,       97, NAN,
macaddress, 2018-06-22 11:23:43, 2018-06-22 11:24:12, -35,       54, NAN,UET Smart University
macaddress, 2018-06-22 11:23:47, 2018-06-22 11:24:03, -23,      108, NAN,
macaddress, 2018-06-22 11:23:49, 2018-06-22 11:23:49,  -1,        9, NAN,

2 个答案:

答案 0 :(得分:1)

我们可以读取文件,然后将split放入列表s中,其中包含2个字符串,一个包含空行之前的所有内容(使用\n\n作为分隔符),另一个包含所有内容后。完成后,我们可以将这些字符串作为CSV读取到单独的DataFrame中:

with open('test.csv') as f:
    s = f.read().split('\n\n')

df1 = pd.read_csv(pd.compat.StringIO(s[0]))
df2 = pd.read_csv(pd.compat.StringIO(s[1]))

df1:

                BSSID       First time seen        Last time seen   channel  \
0   52:62:00:00:03:01   2018-06-22 11:23:45   2018-06-22 11:23:45         9   
1   14:30:04:B2:F5:42   2018-06-22 11:24:04   2018-06-22 11:24:04        11   
2  14:30:04:D6C:95:62   2018-06-22 11:23:50   2018-06-22 11:24:08         6   
3   DC:09:4C:BF:6B:13   2018-06-22 11:23:58   2018-06-22 11:24:06         7   
4   B4:FB:N4:97:F8:03   2018-06-22 11:23:46   2018-06-22 11:24:12         6   
...

df2:

  Station MAC       First time seen        Last time seen   Power   # packets  \
0  macaddrees   2018-06-22 11:23:45   2018-06-22 11:23:45     -78           8   
1  macaddress   2018-06-22 11:23:46   2018-06-22 11:24:05     -73           4   
2  macaddress   2018-06-22 11:23:52   2018-06-22 11:24:12     -73           5   
3  macaddress   2018-06-22 11:23:43   2018-06-22 11:24:12     -71           9   
4   macadress   2018-06-22 11:23:52   2018-06-22 11:23:52     -70           2   

答案 1 :(得分:0)

如果您使用的文件中Station Mac之前的行数是一致的,并且知道有多少行,则可以使用pandas read_csv跳过这些行

df1 = pd.read_csv('filename.csv',skiprows = 14)

其中14(仅计算Station Mac之前的行数)是要跳过的行数。您还可以使用列表而不是整数来指定要跳过的行。