我如何建立一个非贪婪的小组

时间:2019-05-01 01:58:04

标签: python regex

我正在尝试使用正则表达式模式来获取字符串的一部分,文件包含某些标头,并且所有标头都具有相同的格式。我目前正在使用python,并希望保持这种状态。

这是我遇到的示例文件:

TI TEST TEST TEST TEST TEST TEST TEST TEST AJSAOISJAO SOAI
   ASASPAOS
SO EITCHA EITCHA EITCHA EITCHA EITCHA EITCHA EITCHA EITCHA 
AB Purpose
   To examine the evidence supporting the use of simulation-based assessments as surrogates for patient-related outcomes assessed in the workplace.
   Method
   The authors systematically searched MEDLINE, EMBASE, Scopus, and key journals through February 26, 2013. They included original studies that assessed health professionals and trainees using simulation and then linked those scores with patient-related outcomes assessed in the workplace. Two reviewers independently extracted information on participants, tasks, validity evidence, study quality, patent-related and simulation-based outcomes, and magnitude of correlation. All correlations were pooled using random-effects meta-analysis.
   Results
   Of 11,628 potentially relevant articles, the 33 included studies enrolled 1,203 participants, including postgraduate physicians (n = 24 studies), practicing physicians (n = 8), medical students (n = 6), dentists (n = 2), and nurses (n = 1). The pooled correlation for provider behaviors was 0.51 (95% confidence interval [Cl], 0.38 to 0.62; n = 27 studies); for time behaviors, 0.44 (95% Cl, 0.15 to 0.66; n = 7); and for patient outcomes, 0.24(95% Cl, 0.02 to 0.47; n = 5). Most reported validity evidence was favorable, though studies often included only correlational evidence. Validity evidence of internal structure (n = 13 studies), content (n = 12), response process (n = 2), and consequences (n = 1) were reported less often. Three tools showed large pooled correlations and favorable (albeit incomplete) validity evidence.
   Conclusions
   Simulation-based assessments often correlate positively with patient-related outcomes. Although these surrogates are imperfect, tools with established validity evidence may replace workplace-based assessments for evaluating select procedural skills.
OI MANEIRAO MANEIRAOMANEIRAOMANEIRAO MANEIRAO
SN 6516516516
EI 849819981981
PD FEB
PY 2015

我当前的目标是捕获“ AB”标题的整个文本。值得一提的是,AB内容的长度和格式变化不大,它的美感总是段落或文本行,直到下一个标题为止。

我尝试了多种不同的正则表达式模式,使我更接近我想要的是:

\nAB ((.*?\n)+)(\n[A-Z]{2}\s)?

但是,直到文件末尾消耗了它找到的每个标头,我都希望模式在遇到AB之后的下一个标头之后停止匹配,无论它是什么。

标头遵循始终换行的模式,在此之后是两个大写字母和一个空格,或者:

\n[A-Z]{2}\s

感谢任何以任何方式提供帮助的人。

我的问题与正常的贪婪符号不同,因为它不是由不贪婪的角色而不是整个“停止”组来排序的。

1 个答案:

答案 0 :(得分:2)

这是您要找的吗?

^AB ([\w\W]*?)(?=\n[A-Z]{2}\s)

Demo

(?= ...)用于正向超前。它断言给定的子模式可以在这里进行匹配,而无需消耗字符