Question

这是一个文件内容，其中包含超过6000行类似的行：

0000000: 01010000 01001011 00000011 00000100 00010100 00000011  PK....  
0000006: 00000000 00000000 00001000 00000000 01000000 10001101  ....@.  
000000c: 00101001 01000110 10011111 00101100 00000001 11100100  )F.,..  
0000012: 01111100 00101011 00000000 00000000 10111110 11010111  |+....  
0000018: 00000010 00000000 00001101 00000000 00000000 00000000  ......  
000001e: 01110000 01100001 01101110 01100100 01100001 01011111  panda_  
0000024: 01100010 01101001 01101110 00101110 01110100 01111000  bin.tx  
000002a: 01110100 10101100 10011010 01001001 10101110 10011011  t..I..  
0000030: 01010000 00010000 01000101 11100111 01011001 10000101  P.E.Y.

我需要的是从每一行拉出部分内容（仅限第2列到第7列）直到eof并使用python将其放在另一个文件中。

我首先尝试直接复制并粘贴直到eof，。

import StringIO 

infile = "input.txt"
outfile = open("dump.txt", "w")

with open(infile, 'r') as contents:
    line_infile = contents.readline()
    while line_infile:
        outfile.write(line_infile)
        line_infile = contents.readline()
outfile.close()

有效。

第二步我添加了＆＃39;里面 ..这是我无法做到的地方。这是我写的代码：

import StringIO 
import re

infile = "input.txt"
outfile = open("dump.txt", "w")
match = re.compile(ur': (.*?)  ')

with open(infile, 'r') as contents:
    line_infile = contents.readline()
    while line_infile:
        outfile.write(re.findall(match, line_infile))
        line_infile = contents.readline()
outfile.close()

给出错误

outfile.write(re.findall(match, line_infile))
TypeError: expected a character buffer object

用re.copy_reg而不是re.findall

尝试时

outfile.write(re.copy_reg(match, line_infile))
TypeError: 'module' object is not callable

我是编程和python的初学者。从我到目前为止学到的东西，我必须使用正则表达式来匹配字符串，并使用缓冲区来读取大量的数据。即时通讯使用正则表达式': (.*?) '来选择2个匹配字符的内容，": "（a＆＃39;：＆＃39;和空格）和" "（＆＃39; Space＆＃39;和＆＃39; Space＆＃39;）。

问题：

如何复制与正则表达式匹配并放置的内容它在另一个文件中。
我应该使用缓冲区，（我不知道如何使用缓冲区。无法找到关于使用缓冲区的更多内容（示例或教程） readline（）和write（）模块。）

Answer 1

如果您需要的所有内容仅在第2列到第7列中，则可以拆分该行，然后仅使用您需要的元素。

infile = "input.txt"
outfile = open("dump.txt", "w")

with open(infile, 'r') as contents:
    for line in contents:
        line_infile = line.split(' ')[1:7]
        outfile.write(' '.join(line_infile) + '\n')

outfile.close()

Answer 2

试试这个：使用正则表达式[^\:]+:([\d\s]+\s\s).*。它为每行提供2-7列。并通过新线爆炸..

import re
p = re.compile(ur'[^\:]+:([\d\s]+\s\s).*', re.MULTILINE)
test_str = u"0000000: 01010000 01001011 00000011 00000100 00010100 00000011 PK.... \n0000006: 00000000 00000000 00001000 00000000 01000000 10001101 ....@. \n000000c: 00101001 01000110 10011111 00101100 00000001 11100100 )F.,.. \n0000012: 01111100 00101011 00000000 00000000 10111110 11010111 |+.... \n0000018: 00000010 00000000 00001101 00000000 00000000 00000000 ...... \n000001e: 01110000 01100001 01101110 01100100 01100001 01011111 panda_ \n0000024: 01100010 01101001 01101110 00101110 01110100 01111000 bin.tx \n000002a: 01110100 10101100 10011010 01001001 10101110 10011011 t..I.. \n0000030: 01010000 00010000 01000101 11100111 01011001 10000101 P.E.Y. "
subst = u"\1\n"

result = re.sub(p, subst, test_str)

<强> Live demo

Python更新.. 现在可能有效

import StringIO 
import re

infile = "input.txt"
outfile = open("dump.txt", "w")
p = re.compile(ur'[^\:]+:([\d\s]+\s\s).*', re.MULTILINE)
subst = u"\1\n"
with open(infile, 'r') as contents:
    line_infile = contents.readline()
    while line_infile:
        outfile.write(re.sub(p, subst, line_infile))
        line_infile = contents.readline()
outfile.close()

如何在python2.7中使用正则表达式或拆分

2 个答案: