Python:如何使用正则表达式查找重复字符串

时间:2019-07-16 14:18:05

标签: regex python-3.x recursion data-extraction

当在数据块中找到关键字时,我想提取/输出一些数据。如何使用正则表达式检索从第一个“#”到最后一个“)”的所有数据?

//Log_1.txt
# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE 
(some_ID = [12345] reportChange::someMoreInfo called with invalid some ID)

# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE 
(some_ID = [12345] failed::someMoreInfo called with invalid some ID)

代码

import re

with open("Log_1.txt", 'r') as f:
    result = re.search('#(.*)#', f.read())

print(result.group(0))

这不是我的全部代码,但是如果关键字为“ reportChange”,则输出应为>>>

# DON'T WANT #
  .
  .
  .
(some_ID = [12345] reportChange::someMoreInfo called with invalid some ID)

代替

# DON'T WANT #

3 个答案:

答案 0 :(得分:2)

假设您想从最新的# DON'T WANT #开始使用正则表达式#(.*)#[^)]+yourKeyWordHere[^)]+\)。在python中,您可以使用字符串格式,并用{}代替关键字以替换为您想要的任何单词。

import re

keyword='reportChange'

with open("Log_1.txt", 'r') as f:
    result = re.search('#(.*)#[^)]+{}[^)]+\)'.format(keyword), f.read())

print(result.group(0))

答案 1 :(得分:1)

作为正则表达式,您必须使用负向前瞻以及向后负向前瞻。

尝试以下操作:(?!#).*(?<![)])作为正则表达式。它应该输出#和之间的所有内容。

为了将来:使用regex101.com测试您的正则表达式。

答案 2 :(得分:1)

此代码仅打印有reportChange::someMoreInfo called with invalid some ID的数据块:

data = '''//Log_1.txt
# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345] reportChange::someMoreInfo called with invalid some ID)

# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345] failed::someMoreInfo called with invalid some ID)

# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345xxx] reportChange::someMoreInfo called with invalid some ID)
'''

import re

for d in re.split(r'\n\n', data):
    g = re.findall(r'^# DON\'T WANT #.*reportChange::someMoreInfo called with invalid some ID\)$', d, flags=re.M|re.DOTALL)
    if g:
        print(g[0])
        print()

打印:

# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345] reportChange::someMoreInfo called with invalid some ID)

# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345xxx] reportChange::someMoreInfo called with invalid some ID)