我按以下格式构建了文档:
123456789|XXX|1234567|05/05/2012 00:00|81900153|Signed|LASTNAME,FIRSTNAME, M.S.|024813|XXX|3410080|DNR Order Verification:Scanned|
xyz pqs 123
[report_end]
123456789|XXX|1234567|05/05/2012 00:00|81900153|Signed|LASTNAME,FIRSTNAME, M.S.|024813|XXX|3410080|A Note|
xyz pqs 123
[report_end]
每条记录的位置:
如何使用正则表达式捕获这三个元素?
我的方法是
但我不知道如何用正则表达式来实现这一目标。
答案 0 :(得分:1)
您可以使用以下内容:
r"((?:.*?\|){11}\s+(?:.*)\s+\[report_end\])"
<强>输出:强>
Match 1. [0-157] `123456789|XXX|1234567|05/05/2012 00:00|81900153|Signed|LASTNAME,FIRSTNAME, M.S.|024813|XXX|3410080|DNR Order Verification:Scanned|
xyz pqs 123
[report_end]
Match 2. [159-292] `123456789|XXX|1234567|05/05/2012 00:00|81900153|Signed|LASTNAME,FIRSTNAME, M.S.|024813|XXX|3410080|A Note|
xyz pqs 123
[report_end]
DEMO
正则表达式解释
((?:.*?\|){11}\s+(?:.*)\s+\[report_end\])
Options: Case sensitive; Exact spacing; Dot doesn’t match line breaks; ^$ don’t match at line breaks; Regex syntax only
Match the regex below and capture its match into backreference number 1 «((?:.*?\|){11}\s+(?:.*)\s+\[report_end\])»
Match the regular expression below «(?:.*?\|){11}»
Exactly 11 times «{11}»
Match any single character that is NOT a line break character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “|” literally «\|»
Match a single character that is a “whitespace character” «\s+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regular expression below «(?:.*)»
Match any single character that is NOT a line break character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match a single character that is a “whitespace character” «\s+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “[” literally «\[»
Match the character string “report_end” literally «report_end»
Match the character “]” literally «\]»
根据您的评论进行更新
要获得3组,您可以使用:
r"((?:.*?\|){11})\s+(.*)\s+(\[report_end\])
循环所有群组:
import re
pattern = re.compile(r"((?:.*?\|){11})\s+(.*)\s+(\[report_end\])")
for (match1, match2, match3) in re.findall(pattern, string):
print match1 +"\n"+ match2 +"\n"+ match3 +"\n"
现场演示
答案 1 :(得分:1)