解析两个匹配字符串之间的行,但不解析第二个字符串

时间:2019-09-19 15:09:09

标签: python

我想解析在文件中重复出现的两个字符串之间的行。第一个字符串标记目标行的开始,最后一个字符串标记结束。我不希望包含结尾字符串。

这个问题https://askubuntu.com/questions/786922/how-to-capture-lines-between-two-strings-from-a-file-but-only-the-last-occurren与仅捕获目标行块最后一次出现的问题很接近。

浏览该示例,并假设我的文件看起来像这样:

ERROR - Second tech sync failed with rsync error code 255 at Fri May 27 13:50:4$
--------------------------------------------------------------------
After_sync script completed successfully with no errors.
Main script finished at Fri May 27 13:50:43 BST 2016 with PID of 18808.
--------------------------------------------------------------------
Transfer started at Fri May 27 13:50:45 BST 2016
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.
37 approvals pending.
Transfer completed successfully at Fri May 27 14:05:16 BST 2016
--------------------------------------------------------------------
Local repository verification started at Fri May 27 14:35:02 BST 2016
...

ERROR - Second tech sync failed with rsync error code 255 at Fri May 27 13:50:4$
--------------------------------------------------------------------
After_sync script completed successfully with no errors.
Main script finished at Fri May 27 13:50:43 BST 2016 with PID of 18808.
--------------------------------------------------------------------
Transfer started at Fri May 27 13:50:45 BST 2017
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.
ERROR: transfer not complete by end of log file
Transfer completed successfully at Fri May 27 14:05:16 BST 2017
--------------------------------------------------------------------
Local repository verification started at Fri May 27 14:35:02 BST 2016
...
ERROR - Second tech sync failed with rsync error code 255 at Fri May 27 13:50:4$
--------------------------------------------------------------------
After_sync script completed successfully with no errors.
Main script finished at Fri May 27 13:50:43 BST 2016 with PID of 18808.
--------------------------------------------------------------------
Transfer started at Fri May 27 13:50:45 BST 2018
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.
ERROR: transfer not complete by end of log file
Transfer completed successfully at Fri May 27 14:05:16 BST 2018
--------------------------------------------------------------------
Local repository verification started at Fri May 27 14:35:02 BST 2018
...

我应该如何修改此代码:

start = "Transfer started at"
end = "Transfer completed successfully"
buffer = ""
log = False

for line in open('test.txt'):
       if line.startswith(start):
              buffer = line
              log = True
       elif line.startswith(end):
              buffer += line
              log = False
       elif log:
              buffer += line

#print(buffer)

不是打印最后一个块,而是应该打印开始和结束字符串之间的所有块,不包括结束字符串?

可能的预期输出结构如下:

2016: Transfer started at Fri May 27 13:50:45 BST 2016
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.
37 approvals pending.

2017: Transfer started at Fri May 27 13:50:45 BST 2017
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.
ERROR: transfer not complete by end of log file

2018: Transfer started at Fri May 27 13:50:45 BST 2018
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.
ERROR: transfer not complete by end of log file

谢谢。

2 个答案:

答案 0 :(得分:2)

正则表达式可能是您最好的选择:

import re

start = "Transfer started at"
end = "Transfer completed successfully"

with open('test.txt', 'r') as test_file:
    test_file_text = test_file.read()
    desired_output = '\n'.join(re.findall(rf'(?s){start}.*?(?={end})', test_file_text))

print(desired_output)

为您提供此输出:

Transfer started at Fri May 27 13:50:45 BST 2016
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.
37 approvals pending.

Transfer started at Fri May 27 13:50:45 BST 2017
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.
ERROR: transfer not complete by end of log file

Transfer started at Fri May 27 13:50:45 BST 2018
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.
ERROR: transfer not complete by end of log file

或者,如果您只想修改当前结构,则将获得相同的输出:

start = "Transfer started at"
end = "Transfer completed successfully"
buffer = ""
log = False

with open('test.txt', 'r') as test_file:
    for line in test_file:
        if line.startswith(start):
            log = True
        elif line.startswith(end):
            log = False
            buffer += "\n"

        if log:
            buffer += line

print(buffer)

答案 1 :(得分:1)

您在代码中忘记了+运算符。写为:

if line.startswith(start):
    buffer += line

我想您会得到想要的结果。