如何解析没有分隔符的日志中存在的数据?

时间:2019-06-12 07:35:56

标签: python pandas

我必须根据我从某人收到的日志进行一些分析。 单独分析每个日志非常耗时,因此可以考虑使用python和pandas编写脚本来自动执行此操作。 但是数据是混合的,因此我无法解析它。

日志如下:

14:34:41: [REQ][LS1]->[TUT2] [12]FF00000000000000000088DD (Message1)
14:34:41: [REQ][TUT2]->[LS1] [09]5203000C0C0C0C0E0E (Message2)
14:34:49: [REQ][LS1]->[TUT2] [12]FF00000000000000000088DD (Message1)
14:34:49: [REQ][TUT2]->[LS1] [09]5203000C0C0C0C0E0E (Message2)
14:34:56: [REQ][LS1]->[TUT2] [12]FF00000000000000000088DD (Message1)
14:34:57: [REQ][TUT2]->[LS1] [09]5203000C0C0C0C0E0E (Message2)
14:35:04: [REQ][LS1]->[TUT2] [12]FF00000000000000000088DD (Message1)
14:35:05: [REQ][TUT2]->[LS1] [09]5203000C0C0C0C0E0E (Message2)
14:35:05: [REQ][TUT2]->[000] [25]DB03FFFFFF7F00000000FF7F0000FF7F00FA0FF90F00000000 (Debug Message)

我需要这样的输出。

FF 00 00 00 00 00 00 00 00 00 88 DD
52 03 00 0C 0C 0C 0C 0E 0E
FF 00 00 00 00 00 00 00 00 00 88 DD
52 03 00 0C 0C 0C 0C 0E 0E
FF 00 00 00 00 00 00 00 00 00 88 DD
52 03 00 0C 0C 0C 0C 0E 0E
FF 00 00 00 00 00 00 00 00 00 88 DD
52 03 00 0C 0C 0C 0C 0E 0E
DB 03 FF FF FF 7F 00 00 00 00 FF 7F 00 00 FF 7F 00 FA 0F F9 0F 00 00 00 00

以便我可以分析数据。

我使用以下代码来解析数据。

import pandas as pd
# Read File
filename = "file.txt"
df = pd.read_table(filename, sep=' ',\
                   names=['Time','Src-Dst','Data','Type','Remarks'],\
                   engine='python',header=None)
df.head()

但是我不明白如何将这些数据解析为单独的列。

[12]2A00000000000000000088DD

任何人都可以帮助我。

1 个答案:

答案 0 :(得分:2)

使用pd.Series.str.findall

df['Data'].str[4:].str.findall('(.{2})')

输出:

0     [FF, 00, 00, 00, 00, 00, 00, 00, 00, 00, 88, DD]
1                 [52, 03, 00, 0C, 0C, 0C, 0C, 0E, 0E]
2     [FF, 00, 00, 00, 00, 00, 00, 00, 00, 00, 88, DD]
3                 [52, 03, 00, 0C, 0C, 0C, 0C, 0E, 0E]
4     [FF, 00, 00, 00, 00, 00, 00, 00, 00, 00, 88, DD]
5                 [52, 03, 00, 0C, 0C, 0C, 0C, 0E, 0E]
6     [FF, 00, 00, 00, 00, 00, 00, 00, 00, 00, 88, DD]
7                 [52, 03, 00, 0C, 0C, 0C, 0C, 0E, 0E]
8    [DB, 03, FF, FF, FF, 7F, 00, 00, 00, 00, FF, 7...
Name: Data, dtype: object

如果要将其作为数据框,请新建一个:

s = df['Data'].str[4:].str.findall('(.{2})')
pd.DataFrame(list(s))

输出:

   0   1   2   3   4   5   6   7   8     9   ...     15    16    17    18  \
0  FF  00  00  00  00  00  00  00  00    00  ...   None  None  None  None   
1  52  03  00  0C  0C  0C  0C  0E  0E  None  ...   None  None  None  None   
2  FF  00  00  00  00  00  00  00  00    00  ...   None  None  None  None   
3  52  03  00  0C  0C  0C  0C  0E  0E  None  ...   None  None  None  None   
4  FF  00  00  00  00  00  00  00  00    00  ...   None  None  None  None   
5  52  03  00  0C  0C  0C  0C  0E  0E  None  ...   None  None  None  None   
6  FF  00  00  00  00  00  00  00  00    00  ...   None  None  None  None   
7  52  03  00  0C  0C  0C  0C  0E  0E  None  ...   None  None  None  None   
8  DB  03  FF  FF  FF  7F  00  00  00    00  ...     7F    00    FA    0F   

     19    20    21    22    23    24  
0  None  None  None  None  None  None  
1  None  None  None  None  None  None  
2  None  None  None  None  None  None  
3  None  None  None  None  None  None  
4  None  None  None  None  None  None  
5  None  None  None  None  None  None  
6  None  None  None  None  None  None  
7  None  None  None  None  None  None  
8    F9    0F    00    00    00    00