Question

我有一个具有以下结构的文件：

******
Block 1
text
text
...
End 
******
Block 2
text
text
...
End 
******
Block 3
text
text
...
End 
******

，依此类推。我想打开文件读取每一行，并将第一个块的信息保存在字符串中。这就是我到目前为止所拥有的。

Block = ''
with open(File) as file:
        for line in file:
            if re.match('\.Block.*', line):
                Block += line
            if 'str' in line:
                break
    print (Block)

但是，当我打印“块”时，我得到：

Block 1
Block 2
...

如何使用正则表达式将代码行从Block 1复制到End？谢谢

Answer 1

您可以使用itertools.groupby：

import itertools, re
lines = [i.strip('\n') for i in open('filename.txt')]
first_result, *_ = [list(b) for a, b in itertools.groupby(lines, key=lambda x:bool(re.findall('^\*+$', x))) if not a]
print(first_result)

输出：

['Block 1', 'text', 'text', '...', 'End ']

Answer 2

您仅在匹配正则表达式'.Block。*'的行上进行匹配。如果要分配每个块中的值，则必须做更多的工作。

Block = ''
Match = False
with open(File) as file:
        for line in file:
            if re.match('^End$', line):
                Match = False
            if re.match('\.Block.*', line) or Match:
                Match = True
                Block += line
            if 'str' in line:
                break
    print (Block)

Answer 3

with open(File) as ff:
        txt=ff.read() # reading the whole file in

re.findall(r"(?ms)^\s*Block\s*\d+.*?^\s*End\s*$",txt)

 Out: 
        ['Block 1\ntext\ntext\n...\nEnd ',
         'Block 2\ntext\ntext\n...\nEnd ',
         'Block 3\ntext\ntext\n...\nEnd ']

        Or change '\d+' to '1' to get the 1st one. 
        (?ms): m: multiline mode, that we can apply ^ and $ in each line,
               s: '.' matches newline,too.
        ?: non-greedy mode in '.*?'

如何使用正则表达式从文件中复制节？

3 个答案: