我有一个示例 test.csv 文件,该文件被读入了熊猫数据框。
它有20行7列。
csv文件捕获有关SIP呼叫的信息,但是每个呼叫的SIP消息顺序不正确。在此示例中,有2个SIP呼叫,并用空行分隔。
我要解决的问题是正确重新排列Sip消息。
>>> dataframe = pd.read_csv('test.csv')
>>> print(dataframe)
frame.number frame.time ip.src ip.dst sip.Call-ID sip.Method sip.Status-Code
0 25355.0 May 9, 2019 15:57:01.433623000 IST 44.44.44.44 55.55.55.55 0018506d493d00005e25a5083a8f2c85 ACK NaN
1 25148.0 May 9, 2019 15:57:01.363890000 IST 55.55.55.55 44.44.44.44 0018506d493d00005e25a5083a8f2c85 NaN 200.0
2 18371.0 May 9, 2019 15:56:59.411452000 IST 44.44.44.44 55.55.55.55 0018506d493d00005e25a5083a8f2c85 INVITE NaN
3 18403.0 May 9, 2019 15:56:59.421261000 IST 55.55.55.55 44.44.44.44 0018506d493d00005e25a5083a8f2c85 NaN 100.0
4 25134.0 May 9, 2019 15:57:01.360769000 IST 55.55.55.55 44.44.44.44 0018506d493d00005e25a5083a8f2c85 NaN 183.0
5 20875.0 May 9, 2019 15:57:00.064251000 IST 55.55.55.55 44.44.44.44 0018506d493d00005e25a5083a8f2c85 NaN 180.0
6 19244.0 May 9, 2019 15:56:59.694785000 IST 55.55.55.55 44.44.44.44 0018506d493d00005e25a5083a8f2c85 NaN 100.0
7 19227.0 May 9, 2019 15:56:59.690747000 IST 44.44.44.44 55.55.55.55 0018506d493d00005e25a5083a8f2c85 INVITE NaN
8 19022.0 May 9, 2019 15:56:59.620685000 IST 55.55.55.55 44.44.44.44 0018506d493d00005e25a5083a8f2c85 NaN 407.0
9 19221.0 May 9, 2019 15:56:59.689779000 IST 44.44.44.44 55.55.55.55 0018506d493d00005e25a5083a8f2c85 ACK NaN
10 NaN NaN NaN NaN NaN NaN NaN
11 25356.0 May 9, 2019 15:57:01.433623000 IST 44.44.44.44 55.55.55.55 0018506d493d00005e234fs23osd9212 ACK NaN
12 25149.0 May 9, 2019 15:57:01.363890000 IST 55.55.55.55 44.44.44.44 0018506d493d00005e234fs23osd9212 NaN 200.0
13 18372.0 May 9, 2019 15:56:59.411452000 IST 44.44.44.44 55.55.55.55 0018506d493d00005e234fs23osd9212 INVITE NaN
14 18404.0 May 9, 2019 15:56:59.421261000 IST 55.55.55.55 44.44.44.44 0018506d493d00005e234fs23osd9212 NaN 100.0
15 25135.0 May 9, 2019 15:57:01.360769000 IST 55.55.55.55 44.44.44.44 0018506d493d00005e234fs23osd9212 NaN 183.0
16 20876.0 May 9, 2019 15:57:00.064251000 IST 55.55.55.55 44.44.44.44 0018506d493d00005e234fs23osd9212 NaN 180.0
17 19245.0 May 9, 2019 15:56:59.694785000 IST 55.55.55.55 44.44.44.44 0018506d493d00005e234fs23osd9212 NaN 100.0
18 19228.0 May 9, 2019 15:56:59.690747000 IST 44.44.44.44 55.55.55.55 0018506d493d00005e234fs23osd9212 INVITE NaN
19 19023.0 May 9, 2019 15:56:59.620685000 IST 55.55.55.55 44.44.44.44 0018506d493d00005e234fs23osd9212 NaN 407.0
20 19222.0 May 9, 2019 15:56:59.689779000 IST 44.44.44.44 55.55.55.55 0018506d493d00005e234fs23osd9212 ACK NaN
在成功重新排列数据框的行之后,我将插入一个新列并对呼叫进行分类。
dataframe.insert(0, "Classified", " ")
如果SIP消息不正确,我将无法正确分类呼叫。
我已经检查了熊猫 sort_index()和 sort_values(),但这仅有助于解决该问题的一部分逻辑。
>>> dataframe.sort_values(by=['sip.Call-ID'], inplace=True)
这将根据sip.Call-ID列对csv文件进行排序。这些值对于每个SIP呼叫都是唯一的。因此,这允许将消息分组在一起。
frame.number中的值应有助于解决此问题。但是,只能根据每个唯一的sip-call-id而不是整体对它们进行排序,否则我们将有重叠的sip呼叫。我脑海中的伪代码逻辑如下:
for each unique sip-call-id in dataframe:
store it's related frame.number
check if the next frame.number is smaller/bigger
reorder rows based on condition above
我遇到的困难是访问每一行的索引,并知道如何根据每个唯一的sip调用ID对它们进行重新排序,并将该重新排序应用于数据框。
>>> frame_values = dataframe['frame.number'].values
>>> print(frame_values)
[25355. 25148. 18371. 18403. 25134. 20875. 19244. 19227. 19022. 19221.
nan 25356. 25149. 18372. 18404. 25135. 20876. 19245. 19228. 19023.
19222.]
预期结果如下。对于每个唯一的sip-call-id,帧号按升序排列,而相关的SIP消息现在也按顺序排列。帧时间进一步说明了这一点,因为它们也是按升序排列的。这意味着SIP消息肯定是有序的。
通过相关的SIP消息,我的意思是 sip.Method 和 sip.Status-Code 列现在已排好序。
44.44.44.44 55.55.55.55
| |
| INVITE | First SIP Message: INVITE Method
|----------------------->|
| 100 trying | Second SIP Message: 100 Status Code
|<-----------------------|
| 407 Proxy Auth | Third SIP Message: 407 Status Code
|<-----------------------|
| |
| ACK | Fourth SIP Message: ACK Method
|----------------------->|
| INVITE | Fifth SIP Message: INVITE Method
|----------------------->|
| |
| 100 trying | Sixth SIP Message: 100 Status Code
|<-----------------------|
| 180 ringing | Seventh SIP Message: 180 Status Code
|<-----------------------|
| 183 session | Eight SIP Message: 183 Status Code
|<-----------------------|
| 200 OK | Ninth SIP Message: 200 Status Code
|<-----------------------|
| ACK | Tenth SIP Message: ACK Method
|----------------------->|
>>> print(dataframe)
frame.number frame.time ip.src ip.dst sip.Call-ID sip.Method sip.Status-Code
0 18371.0 May 9, 2019 15:56:59.411452000 IST 44.44.44.44 55.55.55.55 0018506d493d00005e25a5083a8f2c85 INVITE NaN
1 18403.0 May 9, 2019 15:56:59.421261000 IST 55.55.55.55 44.44.44.44 0018506d493d00005e25a5083a8f2c85 NaN 100.0
2 19022.0 May 9, 2019 15:56:59.620685000 IST 55.55.55.55 44.44.44.44 0018506d493d00005e25a5083a8f2c85 NaN 407.0
3 19221.0 May 9, 2019 15:56:59.689779000 IST 44.44.44.44 55.55.55.55 0018506d493d00005e25a5083a8f2c85 ACK NaN
4 19227.0 May 9, 2019 15:56:59.690747000 IST 44.44.44.44 55.55.55.55 0018506d493d00005e25a5083a8f2c85 INVITE NaN
5 19244.0 May 9, 2019 15:56:59.694785000 IST 55.55.55.55 44.44.44.44 0018506d493d00005e25a5083a8f2c85 NaN 100.0
6 20875.0 May 9, 2019 15:57:00.064251000 IST 55.55.55.55 44.44.44.44 0018506d493d00005e25a5083a8f2c85 NaN 180.0
7 25134.0 May 9, 2019 15:57:01.360769000 IST 55.55.55.55 44.44.44.44 0018506d493d00005e25a5083a8f2c85 NaN 183.0
8 25148.0 May 9, 2019 15:57:01.363890000 IST 55.55.55.55 44.44.44.44 0018506d493d00005e25a5083a8f2c85 NaN 200.0
9 25355.0 May 9, 2019 15:57:01.433623000 IST 44.44.44.44 55.55.55.55 0018506d493d00005e25a5083a8f2c85 ACK NaN
10 NaN NaN NaN NaN NaN NaN NaN
11 18372.0 May 9, 2019 15:56:59.411452000 IST 44.44.44.44 55.55.55.55 0018506d493d00005e234fs23osd9212 INVITE NaN
12 18404.0 May 9, 2019 15:56:59.421261000 IST 55.55.55.55 44.44.44.44 0018506d493d00005e234fs23osd9212 NaN 100.0
13 19023.0 May 9, 2019 15:56:59.620685000 IST 55.55.55.55 44.44.44.44 0018506d493d00005e234fs23osd9212 NaN 407.0
14 19222.0 May 9, 2019 15:56:59.689779000 IST 44.44.44.44 55.55.55.55 0018506d493d00005e234fs23osd9212 ACK NaN
15 19228.0 May 9, 2019 15:56:59.690747000 IST 44.44.44.44 55.55.55.55 0018506d493d00005e234fs23osd9212 INVITE NaN
16 19245.0 May 9, 2019 15:56:59.694785000 IST 55.55.55.55 44.44.44.44 0018506d493d00005e234fs23osd9212 NaN 100.0
17 20876.0 May 9, 2019 15:57:00.064251000 IST 55.55.55.55 44.44.44.44 0018506d493d00005e234fs23osd9212 NaN 180.0
18 25135.0 May 9, 2019 15:57:01.360769000 IST 55.55.55.55 44.44.44.44 0018506d493d00005e234fs23osd9212 NaN 183.0
19 25149.0 May 9, 2019 15:57:01.363890000 IST 55.55.55.55 44.44.44.44 0018506d493d00005e234fs23osd9212 NaN 200.0
20 25356.0 May 9, 2019 15:57:01.433623000 IST 44.44.44.44 55.55.55.55 0018506d493d00005e234fs23osd9212 ACK NaN