异常数据到pandas数据帧有多种类型

时间:2018-02-12 00:52:31

标签: python-3.x pandas dataframe

我有一个如下数据集:

Process: matts.exe Pid: 900 Address: 0x7f6a0000
Vad Tag: Vad  Protection: PAGE_EXECUTE_READWRITE
Flags: Protection: 6

0x7f6a0000  c8 00 00 00 58 01 00 00 ff ee ff ee 08 70 00 00   ....X........p..
0x7f6a0010  08 00 00 00 00 fe 00 00 00 00 10 00 00 20 00 00   ................
0x7f6a0020  00 02 00 00 00 20 00 00 8d 01 00 00 ff ef fd 7f   ................
0x7f6a0030  03 00 08 06 00 00 00 00 00 00 00 00 00 00 00 00   ................

0x7f6a0000 c8000000         ENTER 0x0, 0x0
0x7f6a0004 58               POP EAX
0x7f6a0005 0100             ADD [EAX], EAX
0x7f6a0007 00ff             ADD BH, BH

Process: matts2.exe Pid: 910 Address: 0x7f6a0000
Vad Tag: Vad  Protection: PAGE_EXECUTE_READWRITE
Flags: Protection: 6

0x7f6a0000  c8 00 00 00 58 01 00 00 ff ee ff ee 08 70 00 00   ....X........p..
0x7f6a0010  08 00 00 00 00 fe 00 00 00 00 10 00 00 20 00 00   ................
0x7f6a0020  00 02 00 00 00 20 00 00 8d 01 00 00 ff ef fd 7f   ................
0x7f6a0030  03 00 08 06 00 00 00 00 00 00 00 00 00 00 00 00   ................

0x7f6a0000 c8000000         ENTER 0x0, 0x0
0x7f6a0004 58               POP EAX
0x7f6a0005 0100             ADD [EAX], EAX
0x7f6a0007 00ff             ADD BH, BH

如何将此数据放入pandas数据框中,如下所示?

Process    Pid   Address     Vad_Tag   Protection              Protection   Hex_out                                                                          Assembly_Out
matts.exe  900   0x7f6a0000  Vad       PAGE_EXECUTE_READWRITE  6            0x7f6a0000  c8 00 00 00 58 01 00 00 ff ee ff ee 08 70 00 00   ....X........p..   0x7f6a0000 c8000000         ENTER 0x0, 0x0
                                                                            0x7f6a0010  08 00 00 00 00 fe 00 00 00 00 10 00 00 20 00 00   ................   0x7f6a0004 58               POP EAX
                                                                            0x7f6a0020  00 02 00 00 00 20 00 00 8d 01 00 00 ff ef fd 7f   ................   0x7f6a0005 0100             ADD [EAX], EAX
                                                                            0x7f6a0030  03 00 08 06 00 00 00 00 00 00 00 00 00 00 00 00   ................   0x7f6a0007 00ff             ADD BH, BH

matts2.exe 910   0x7f6a0000  Vad       PAGE_EXECUTE_READWRITE  6            0x7f6a0000  c8 00 00 00 58 01 00 00 ff ee ff ee 08 70 00 00   ....X........p..   0x7f6a0000 c8000000         ENTER 0x0, 0x0
                                                                            0x7f6a0010  08 00 00 00 00 fe 00 00 00 00 10 00 00 20 00 00   ................   0x7f6a0004 58               POP EAX
                                                                            0x7f6a0020  00 02 00 00 00 20 00 00 8d 01 00 00 ff ef fd 7f   ................   0x7f6a0005 0100             ADD [EAX], EAX
                                                                            0x7f6a0030  03 00 08 06 00 00 00 00 00 00 00 00 00 00 00 00   ................   0x7f6a0007 00ff             ADD BH, BH

目前我可以将它作为一个表读取,但它将所有内容放在一个单独的行中。每隔三个空白行就是我用作分隔符但仍然存在数据整形问题。十六进制和汇编需要是一种字符串格式,为了简洁起见,我把它放在表格中。任何帮助,将不胜感激。

1 个答案:

答案 0 :(得分:1)

你应该两次通过。第一个是read_table(usecols=0)解析第一个"字"在每一行。然后使用该系列来确定部分的开始和结束位置,并为每个部分调用read_table(skiprows=X, nrows=Y)一次(其中部分被定义为具有统一格式的块)。