从多行读取数据作为单个项目

时间:2017-02-19 04:15:38

标签: python python-3.x pandas

我有一组来自文件的数据

"johnnyboy"=splice(23):15,00,30,00,31,00,32,02,39,00,62,00,a3,00,33,00,2d,0f,39,00,\
      00,5c,00,6d,00,65,00,64,00,69,00,61,00,5c,00,57,00,69,00,6e,00,64,00,6f,00,\
      77,00,73,00,20,00,41,00,61,00,63,00,6b,00,65,aa,72,00,6f,00,75,00,6e,dd,64,\
      00,2e,00,77,00,61,00,76,00,ff,00

"johnnyboy"="gotwastedatthehouse"

"johnnyboy"=splice(23):15,00,30,00,31,00,32,02,39,00,62,00,a3,00,33,00,2d,0f,39,00,\
      00,5c,00,6d,00,65,00,64,00,69,00,61,00,5c,00,57,00,69,00,6e,00,64,00,6f,00,\
      77,00,73,00,20,00,41,00,61,00,63,00,6b,00,65,aa,72,00,6f,00,75,00,6e,dd,64,\
      00,2e,00,77,00,61,00,76,00,ff,00


[mattplayhouse\wherecanwego\tothepoolhall]

我如何阅读/引用每个“johnnyboy”= splice(23)的文本,如同单行:

"johnnyboy"=splice(23):15,00,30,00,31,00,32,02,39,00,62,00,a3,00,33,00,2d,0f,39,00,00,5c,00,6d,00,65,00,64,00,69,00,61,00,5c,00,57,00,69,00,6e,00,64,00,6f,00,77,00,73,00,20,00,41,00,61,00,63,00,6b,00,65,aa,72,00,6f,00,75,00,6e,dd,64,00,2e,00,77,00,61,00,76,00,ff,00

我目前正在根据splice(23)匹配他的正则表达式:搜索如下:

re_johnny = re.compile('splice')
with open("file.txt", 'r') as file:
    read = file.readlines()
    for line in read:
        if re_johnny.match(line):
            print(line)

我认为我需要取出并删除反斜杠和空格来合并线条,但我不熟悉如何做到这一点而不是获得空白行或不像我的正则表达式的新行。尝试第一次尝试时,我的最后一行被不当地拉了。任何帮助都会很棒。

2 个答案:

答案 0 :(得分:1)

输入文件:fin

"johnnyboy"=splice(23):15,00,30,00,31,00,32,02,39,00,62,00,a3,00,33,00,2d,0f,39,00,\
      00,5c,00,6d,00,65,00,64,00,69,00,61,00,5c,00,57,00,69,00,6e,00,64,00,6f,00,\
      77,00,73,00,20,00,41,00,61,00,63,00,6b,00,65,aa,72,00,6f,00,75,00,6e,dd,64,\
      00,2e,00,77,00,61,00,76,00,ff,00

"johnnyboy"="gotwastedatthehouse"

"johnnyboy"=splice(23):15,00,30,00,31,00,32,02,39,00,62,00,a3,00,33,00,2d,0f,39,00,\
      00,5c,00,6d,00,65,00,64,00,69,00,61,00,5c,00,57,00,69,00,6e,00,64,00,6f,00,\
      77,00,73,00,20,00,41,00,61,00,63,00,6b,00,65,aa,72,00,6f,00,75,00,6e,dd,64,\
      00,2e,00,77,00,61,00,76,00,ff,00


[mattplayhouse\wherecanwego\tothepoolhall]

根据tigerhawk的建议,您可以尝试这样的事情:

代码:

import re

with open('fin', 'r') as f:
    for l in [''.join([b.strip('\\') for b in a.split()]) for a in f.read().split('\n\n')]:
        if 'splice' in l:
            print(l)

输出:

"johnnyboy"=splice(23):15,00,30,00,31,00,32,02,39,00,62,00,a3,00,33,00,2d,0f,39,00,00,5c,00,6d,00,65,00,64,00,69,00,61,00,5c,00,57,00,69,00,6e,00,64,00,6f,00,77,00,73,00,20,00,41,00,61,00,63,00,6b,00,65,aa,72,00,6f,00,75,00,6e,dd,64,00,2e,00,77,00,61,00,76,00,ff,00
"johnnyboy"=splice(23):15,00,30,00,31,00,32,02,39,00,62,00,a3,00,33,00,2d,0f,39,00,00,5c,00,6d,00,65,00,64,00,69,00,61,00,5c,00,57,00,69,00,6e,00,64,00,6f,00,77,00,73,00,20,00,41,00,61,00,63,00,6b,00,65,aa,72,00,6f,00,75,00,6e,dd,64,00,2e,00,77,00,61,00,76,00,ff,00

答案 1 :(得分:0)

使用正则表达式,你的问题成倍增加。相反,保持简单:

  • 如果一行以"开头,则会开始记录。
  • 否则,请将其附加到之前的记录中。

您可以在Python中的几行中实现对此类方案的解析。而且你不需要正则表达式。