Question

我有一个大的文本字符串，init有几个看起来非常相似的块;

text = '\n\n(d)In the event of this happens a Fee 
of \xc2\xa32,000 gross, on each such occasion.\n\n'

使用下面的代码，我可以找到所有钱的实例：

import re
re.finall('\xa3(.*)', text)

但是这只返回逗号In the event of this happens a Fee of \xc2\xa32,000 gross而不是整个块，我希望返回英国英镑\xa3的Unicode提及的块

Answer 1

我提出这个正则表达式：

text = ('\n\nthis is not wanted\n\n'
        '(d)In the event of this happens a Fee\n'
        'of \xc2\xa32,000 gross, on each such occasion.\n\n'
        'another wanted line with pound: \xc2\xa31,000\n\n'
        'this is also not wanted\n\n')

re.findall(r'(?:.+\n)*.*\xa3(?:.+\n)*', text)

这将找到包含至少一个\xa3的非空行的所有多行块。

正如@ wiktor-stribiżew在评论中指出的那样，这只能找到那些在英镑符号后面有另一个字符的块;这似乎是你想要的，所以没问题，但应该提到。

Answer 2

试试这个：

import re 
text = '\n\nblock1\xa3block1.\n\nblock2\x80block2\n\nblock3\xa3block3\n\n' 
result= re.findall('.*\xa3.*', text) #capture only blocks containing pound symbol and discards block2 that contains euro 
print(result)

用于查找两个\ n \ n和\ n \ n之间的所有内容的Python正则表达式

2 个答案: