当在数据块中找到关键字时,我想提取/输出一些数据。如何使用正则表达式检索从第一个“#”到最后一个“)”的所有数据?
//Log_1.txt
# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345] reportChange::someMoreInfo called with invalid some ID)
# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345] failed::someMoreInfo called with invalid some ID)
代码
import re
with open("Log_1.txt", 'r') as f:
result = re.search('#(.*)#', f.read())
print(result.group(0))
这不是我的全部代码,但是如果关键字为“ reportChange”,则输出应为>>>
# DON'T WANT #
.
.
.
(some_ID = [12345] reportChange::someMoreInfo called with invalid some ID)
代替
# DON'T WANT #
答案 0 :(得分:2)
假设您想从最新的# DON'T WANT #
开始使用正则表达式#(.*)#[^)]+yourKeyWordHere[^)]+\)
。在python中,您可以使用字符串格式,并用{}
代替关键字以替换为您想要的任何单词。
import re
keyword='reportChange'
with open("Log_1.txt", 'r') as f:
result = re.search('#(.*)#[^)]+{}[^)]+\)'.format(keyword), f.read())
print(result.group(0))
答案 1 :(得分:1)
作为正则表达式,您必须使用负向前瞻以及向后负向前瞻。
尝试以下操作:(?!#).*(?<![)])
作为正则表达式。它应该输出#和之间的所有内容。
为了将来:使用regex101.com测试您的正则表达式。
答案 2 :(得分:1)
此代码仅打印有reportChange::someMoreInfo called with invalid some ID
的数据块:
data = '''//Log_1.txt
# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345] reportChange::someMoreInfo called with invalid some ID)
# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345] failed::someMoreInfo called with invalid some ID)
# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345xxx] reportChange::someMoreInfo called with invalid some ID)
'''
import re
for d in re.split(r'\n\n', data):
g = re.findall(r'^# DON\'T WANT #.*reportChange::someMoreInfo called with invalid some ID\)$', d, flags=re.M|re.DOTALL)
if g:
print(g[0])
print()
打印:
# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345] reportChange::someMoreInfo called with invalid some ID)
# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345xxx] reportChange::someMoreInfo called with invalid some ID)