任何人都可以帮助正则表达式从以下文本中“标题:”之后提取文本短语:(刚刚加粗文本以清楚地描绘要提取的部分)
Title: Anorectal Fistula (Fistula-in-Ano) Procedure Code(s): Effective date: 7/1/07 Title: 2003247 or previous effective dates) Title: ST2 Assay for Chronic Heart Failure Description/Background Heart Failure HF is one among many cardiovascular diseases that comprises a major cause of morbidity and mortality worldwide. The term “heart failure” (HF) refers to a complex clinical syndrome .
我正在使用正则表达式:(?:Title: \n+(.*))|(?:Title:\n+(.*))|(?<=Title: )(.*)(?=Procedure)
然而,它似乎没有正确捕获这些术语!我使用的是Python 2.7.12
答案 0 :(得分:0)
我建议使用
Title:\s*(.*?)\s*Procedure|Title:\s*(.*)
请参阅regex demo。
详细:
Title:
- 文字Title:
\s*
- 0+ whitespaces (.*?)
- 第1组:除了换行符号之外的任何0 +字符,尽可能少到第一个字符\s*Procedure
- 0+空格+字符串Procedure
|
- 或Title:\s*
- Title:
string + 0+ whitespaces (.*)
- 第2组:尽可能多地使用除了换行符号之外的任何字符零(或其余部分)。import re
regex = r"Title:\s*(.*?)\s*Procedure|Title:\s*(.*)"
test_str = ("Title: Anorectal Fistula (Fistula-in-Ano) Procedure Code(s):\n\n"
"Effective date: 7/1/07\n\n"
"Title:\n\n"
"2003247\n\n"
"or previous effective dates)\n\n"
"Title:\n\n"
"ST2 Assay for Chronic Heart Failure\n\n"
"Description/Background\n\n"
"Heart Failure\n\n"
"HF is one among many cardiovascular diseases that comprises a major cause of morbidity and mortality worldwide. The term “heart failure” (HF) refers to a complex clinical syndrome .")
res = []
for m in re.finditer(regex, test_str):
if m.group(1):
res.append(m.group(1))
else:
res.append(m.group(2))
print(res)
# => ['Anorectal Fistula (Fistula-in-Ano)', '2003247', 'ST2 Assay for Chronic Heart Failure']