如何根据唯一列值对熊猫数据框行进行重新排序

时间:2019-05-11 16:49:40

标签: python pandas csv

我有一个示例 test.csv 文件,该文件被读入了熊猫数据框。

它有20行7列。

csv文件捕获有关SIP呼叫的信息,但是每个呼叫的SIP消息顺序不正确。在此示例中,有2个SIP呼叫,并用空行分隔。

我要解决的问题是正确重新排列Sip消息。

>>> dataframe = pd.read_csv('test.csv')
>>> print(dataframe)
    frame.number                           frame.time       ip.src       ip.dst                       sip.Call-ID sip.Method  sip.Status-Code
0        25355.0  May  9, 2019 15:57:01.433623000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e25a5083a8f2c85        ACK              NaN
1        25148.0  May  9, 2019 15:57:01.363890000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e25a5083a8f2c85        NaN            200.0
2        18371.0  May  9, 2019 15:56:59.411452000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e25a5083a8f2c85     INVITE              NaN
3        18403.0  May  9, 2019 15:56:59.421261000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e25a5083a8f2c85        NaN            100.0
4        25134.0  May  9, 2019 15:57:01.360769000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e25a5083a8f2c85        NaN            183.0
5        20875.0  May  9, 2019 15:57:00.064251000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e25a5083a8f2c85        NaN            180.0
6        19244.0  May  9, 2019 15:56:59.694785000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e25a5083a8f2c85        NaN            100.0
7        19227.0  May  9, 2019 15:56:59.690747000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e25a5083a8f2c85     INVITE              NaN
8        19022.0  May  9, 2019 15:56:59.620685000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e25a5083a8f2c85        NaN            407.0
9        19221.0  May  9, 2019 15:56:59.689779000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e25a5083a8f2c85        ACK              NaN
10           NaN                                  NaN          NaN          NaN                               NaN        NaN              NaN
11       25356.0  May  9, 2019 15:57:01.433623000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e234fs23osd9212        ACK              NaN
12       25149.0  May  9, 2019 15:57:01.363890000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e234fs23osd9212        NaN            200.0
13       18372.0  May  9, 2019 15:56:59.411452000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e234fs23osd9212     INVITE              NaN
14       18404.0  May  9, 2019 15:56:59.421261000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e234fs23osd9212        NaN            100.0
15       25135.0  May  9, 2019 15:57:01.360769000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e234fs23osd9212        NaN            183.0
16       20876.0  May  9, 2019 15:57:00.064251000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e234fs23osd9212        NaN            180.0
17       19245.0  May  9, 2019 15:56:59.694785000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e234fs23osd9212        NaN            100.0
18       19228.0  May  9, 2019 15:56:59.690747000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e234fs23osd9212     INVITE              NaN
19       19023.0  May  9, 2019 15:56:59.620685000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e234fs23osd9212        NaN            407.0
20       19222.0  May  9, 2019 15:56:59.689779000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e234fs23osd9212        ACK              NaN

在成功重新排列数据框的行之后,我将插入一个新列并对呼叫进行分类。

dataframe.insert(0, "Classified", " ")

如果SIP消息不正确,我将无法正确分类呼叫。

我已经检查了熊猫 sort_index() sort_values(),但这仅有助于解决该问题的一部分逻辑。

>>> dataframe.sort_values(by=['sip.Call-ID'], inplace=True)

这将根据sip.Call-ID列对csv文件进行排序。这些值对于每个SIP呼叫都是唯一的。因此,这允许将消息分组在一起。

frame.number中的值应有助于解决此问题。但是,只能根据每个唯一的sip-call-id而不是整体对它们进行排序,否则我们将有重叠的sip呼叫。我脑海中的伪代码逻辑如下:

for each unique sip-call-id in dataframe:
    store it's related frame.number
    check if the next frame.number is smaller/bigger
    reorder rows based on condition above

我遇到的困难是访问每一行的索引,并知道如何根据每个唯一的sip调用ID对它们进行重新排序,并将该重新排序应用于数据框。

>>> frame_values = dataframe['frame.number'].values
>>> print(frame_values)
[25355. 25148. 18371. 18403. 25134. 20875. 19244. 19227. 19022. 19221.
    nan 25356. 25149. 18372. 18404. 25135. 20876. 19245. 19228. 19023.
 19222.]

预期结果如下。对于每个唯一的sip-call-id,帧号按升序排列,而相关的SIP消息现在也按顺序排列。帧时间进一步说明了这一点,因为它们也是按升序排列的。这意味着SIP消息肯定是有序的。

通过相关的SIP消息,我的意思是 sip.Method sip.Status-Code 列现在已排好序。


  44.44.44.44             55.55.55.55
     |                        |
     |       INVITE           | First SIP Message: INVITE Method
     |----------------------->|
     |    100 trying          | Second SIP Message: 100 Status Code
     |<-----------------------|
     |    407 Proxy Auth      | Third SIP Message: 407 Status Code
     |<-----------------------|
     |                        |
     |         ACK            | Fourth SIP Message: ACK Method 
     |----------------------->|
     |         INVITE         | Fifth SIP Message: INVITE Method
     |----------------------->|
     |                        |
     |    100 trying          | Sixth SIP Message: 100 Status Code
     |<-----------------------|
     |    180 ringing         | Seventh SIP Message: 180 Status Code
     |<-----------------------|
     |    183 session         | Eight SIP Message: 183 Status Code
     |<-----------------------|
     |       200 OK           | Ninth SIP Message: 200 Status Code
     |<-----------------------|
     |         ACK            | Tenth SIP Message: ACK Method 
     |----------------------->|
>>> print(dataframe)
    frame.number                           frame.time       ip.src       ip.dst                       sip.Call-ID sip.Method  sip.Status-Code
0        18371.0  May  9, 2019 15:56:59.411452000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e25a5083a8f2c85     INVITE              NaN
1        18403.0  May  9, 2019 15:56:59.421261000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e25a5083a8f2c85        NaN            100.0
2        19022.0  May  9, 2019 15:56:59.620685000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e25a5083a8f2c85        NaN            407.0
3        19221.0  May  9, 2019 15:56:59.689779000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e25a5083a8f2c85        ACK              NaN
4        19227.0  May  9, 2019 15:56:59.690747000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e25a5083a8f2c85     INVITE              NaN
5        19244.0  May  9, 2019 15:56:59.694785000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e25a5083a8f2c85        NaN            100.0
6        20875.0  May  9, 2019 15:57:00.064251000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e25a5083a8f2c85        NaN            180.0
7        25134.0  May  9, 2019 15:57:01.360769000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e25a5083a8f2c85        NaN            183.0
8        25148.0  May  9, 2019 15:57:01.363890000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e25a5083a8f2c85        NaN            200.0
9        25355.0  May  9, 2019 15:57:01.433623000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e25a5083a8f2c85        ACK              NaN
10           NaN                                  NaN          NaN          NaN                               NaN        NaN              NaN
11       18372.0  May  9, 2019 15:56:59.411452000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e234fs23osd9212     INVITE              NaN
12       18404.0  May  9, 2019 15:56:59.421261000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e234fs23osd9212        NaN            100.0
13       19023.0  May  9, 2019 15:56:59.620685000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e234fs23osd9212        NaN            407.0
14       19222.0  May  9, 2019 15:56:59.689779000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e234fs23osd9212        ACK              NaN
15       19228.0  May  9, 2019 15:56:59.690747000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e234fs23osd9212     INVITE              NaN
16       19245.0  May  9, 2019 15:56:59.694785000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e234fs23osd9212        NaN            100.0
17       20876.0  May  9, 2019 15:57:00.064251000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e234fs23osd9212        NaN            180.0
18       25135.0  May  9, 2019 15:57:01.360769000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e234fs23osd9212        NaN            183.0
19       25149.0  May  9, 2019 15:57:01.363890000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e234fs23osd9212        NaN            200.0
20       25356.0  May  9, 2019 15:57:01.433623000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e234fs23osd9212        ACK              NaN

0 个答案:

没有答案