所以我正在尝试处理以下文字。我想要的是获得一个匹配的数据,从每个班级的学分开始,并以季节和年份结束。所以对于第一堂课,它看起来像这样:
3 credits in Philosophical Perspectives
PHIL 101L
PHILOSOPHICAL PERSPECTIVES
B
3
Fall 2014
此外,我还需要获得他们仍然需要的课程。如果您发现他们在历史上缺少3个学分。这是我的文字:
3 credits in Philosophical Perspectives
PHIL 101L
PHILOSOPHICAL PERSPECTIVES
B
3
Fall 2014
Student View
3 credits in Fine Arts
ART 160L
HIST WEST ART I
B+
3
Fall 2014
3 credits in History
Still Needed:
Click here to see classes that satisfy this requirement.
3 credits in Literature
ENG 201L
INTRO LINGUISTIC
IP
(3)
Spring 2016
3 credits in Math
Still Needed:
Click here to see classes that satisfy this requirement.
3 credits in Natural Science
BIOL 225L
TOPICS IN NUTRITION
A-
3
Spring 2015
3 credits Ethics/Applied Ethics/Religious Studies
REST 209L
WORLD RELIGIONS
A-
3
Spring 2015
3 credits in Social Science
ECON 104L
PRINC MACROECONOM
T
3
Fall 2014
答案 0 :(得分:0)
(?:^|(?<=\n))\d+\s+credits[]\s\S]*?(?=\n\d+\s+credits|$)
您可以将其与findall
一起使用。请参阅演示。
https://regex101.com/r/gK9aI6/1
import re
p = re.compile(r'(?:^|(?<=\n))\d+\s+credits[]\s\S]*?(?=\n\d+\s+credits|$)')
test_str = "3 credits in Philosophical Perspectives\nPHIL 101L\nPHILOSOPHICAL PERSPECTIVES\nB\n3\nFall 2014\nStudent View\n3 credits in Fine Arts\nART 160L\nHIST WEST ART I\nB+\n3\nFall 2014\n3 credits in History\nStill Needed:\nClick here to see classes that satisfy this requirement.\n3 credits in Literature\nENG 201L\nINTRO LINGUISTIC\nIP\n(3)\nSpring 2016\n3 credits in Math\nStill Needed:\nClick here to see classes that satisfy this requirement.\n3 credits in Natural Science\nBIOL 225L\nTOPICS IN NUTRITION\nA-\n3\nSpring 2015\n3 credits Ethics/Applied Ethics/Religious Studies\nREST 209L\nWORLD RELIGIONS\nA-\n3\nSpring 2015\n3 credits in Social Science\nECON 104L\nPRINC MACROECONOM\nT\n3\nFall 2014"
re.findall(p, test_str)
答案 1 :(得分:0)
你可以组合一个非贪婪的“任何”序列,并使用每组最后一行的已知结构将它们解析成块:
/((?:.\n?)*?(?:Fall|Summer|Spring|Winter)\s\d{4})/g
(?:.\n?)*?
- 吃任何字符(可能后面有换行符)一次 (?:Fall|Summer|Spring|Winter)\s\d{4}
See the demo here并注意每个赠送金额实际上都是单正则表达式匹配。
答案 2 :(得分:0)
尝试以下代码段:
import re
courses = r"....your...content"
rx = re.compile(r"\d+.*?(?:FALL|SPRING)\s*\d{4}", re.IGNORECASE | re.DOTALL)
for course in rx.finditer(courses):
print(course.group())
print("----------------------------\n")
如果courses
包含您的示例内容,则输出为:
3 credits in Philosophical Perspectives
PHIL 101L
PHILOSOPHICAL PERSPECTIVES
B
3
Fall 2014
----------------------------
3 credits in Fine Arts
ART 160L
HIST WEST ART I
B+
3
Fall 2014
----------------------------
3 credits in History
Still Needed:
Click here to see classes that satisfy this requirement.
3 credits in Literature
ENG 201L
INTRO LINGUISTIC
IP
(3)
Spring 2016
----------------------------
... omitting rest....