我想解析在文件中重复出现的两个字符串之间的行。第一个字符串标记目标行的开始,最后一个字符串标记结束。我不希望包含结尾字符串。
这个问题https://askubuntu.com/questions/786922/how-to-capture-lines-between-two-strings-from-a-file-but-only-the-last-occurren与仅捕获目标行块最后一次出现的问题很接近。
浏览该示例,并假设我的文件看起来像这样:
ERROR - Second tech sync failed with rsync error code 255 at Fri May 27 13:50:4$
--------------------------------------------------------------------
After_sync script completed successfully with no errors.
Main script finished at Fri May 27 13:50:43 BST 2016 with PID of 18808.
--------------------------------------------------------------------
Transfer started at Fri May 27 13:50:45 BST 2016
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.
37 approvals pending.
Transfer completed successfully at Fri May 27 14:05:16 BST 2016
--------------------------------------------------------------------
Local repository verification started at Fri May 27 14:35:02 BST 2016
...
ERROR - Second tech sync failed with rsync error code 255 at Fri May 27 13:50:4$
--------------------------------------------------------------------
After_sync script completed successfully with no errors.
Main script finished at Fri May 27 13:50:43 BST 2016 with PID of 18808.
--------------------------------------------------------------------
Transfer started at Fri May 27 13:50:45 BST 2017
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.
ERROR: transfer not complete by end of log file
Transfer completed successfully at Fri May 27 14:05:16 BST 2017
--------------------------------------------------------------------
Local repository verification started at Fri May 27 14:35:02 BST 2016
...
ERROR - Second tech sync failed with rsync error code 255 at Fri May 27 13:50:4$
--------------------------------------------------------------------
After_sync script completed successfully with no errors.
Main script finished at Fri May 27 13:50:43 BST 2016 with PID of 18808.
--------------------------------------------------------------------
Transfer started at Fri May 27 13:50:45 BST 2018
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.
ERROR: transfer not complete by end of log file
Transfer completed successfully at Fri May 27 14:05:16 BST 2018
--------------------------------------------------------------------
Local repository verification started at Fri May 27 14:35:02 BST 2018
...
我应该如何修改此代码:
start = "Transfer started at"
end = "Transfer completed successfully"
buffer = ""
log = False
for line in open('test.txt'):
if line.startswith(start):
buffer = line
log = True
elif line.startswith(end):
buffer += line
log = False
elif log:
buffer += line
#print(buffer)
不是打印最后一个块,而是应该打印开始和结束字符串之间的所有块,不包括结束字符串?
可能的预期输出结构如下:
2016: Transfer started at Fri May 27 13:50:45 BST 2016
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.
37 approvals pending.
2017: Transfer started at Fri May 27 13:50:45 BST 2017
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.
ERROR: transfer not complete by end of log file
2018: Transfer started at Fri May 27 13:50:45 BST 2018
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.
ERROR: transfer not complete by end of log file
谢谢。
答案 0 :(得分:2)
正则表达式可能是您最好的选择:
import re
start = "Transfer started at"
end = "Transfer completed successfully"
with open('test.txt', 'r') as test_file:
test_file_text = test_file.read()
desired_output = '\n'.join(re.findall(rf'(?s){start}.*?(?={end})', test_file_text))
print(desired_output)
为您提供此输出:
Transfer started at Fri May 27 13:50:45 BST 2016
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.
37 approvals pending.
Transfer started at Fri May 27 13:50:45 BST 2017
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.
ERROR: transfer not complete by end of log file
Transfer started at Fri May 27 13:50:45 BST 2018
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.
ERROR: transfer not complete by end of log file
或者,如果您只想修改当前结构,则将获得相同的输出:
start = "Transfer started at"
end = "Transfer completed successfully"
buffer = ""
log = False
with open('test.txt', 'r') as test_file:
for line in test_file:
if line.startswith(start):
log = True
elif line.startswith(end):
log = False
buffer += "\n"
if log:
buffer += line
print(buffer)
答案 1 :(得分:1)
您在代码中忘记了+
运算符。写为:
if line.startswith(start):
buffer += line
我想您会得到想要的结果。