示例程序
import re
demostr = "Department of Microbiology and Immunology. Faculty of Tropical Medicine, Mahidol University, Electronic address: pornsawan.lea@mahidol.ac.th."
org = re.search(r"([A-Z][^\s,.]+[.]?\s[(]?)*(Dept|Association|Office|University|Department)[^,\d]*(?=,|\d)", demostr).group()
print(org)
输出
Department of Microbiology and Immunology. Faculty of Tropical Medicine
程序从给定的字符串中提取Organization,Department。如果,
之后有Immunology
,则效果很好。但是,如果在组织后出现点.
,则会提取错误的输出。
所需的输出如下所示-
预期输出
Department of Microbiology and Immunology
答案 0 :(得分:0)
您在正则表达式中有两点很好用
([A-Z][^\s,.]+[.]?\s[(]?)*(Dept|Association|Office|University|Department)[^,\d]*?(?=,|\.|\d)
您错过的事情
.*
-本质上是贪婪的,由于您的需要,您需要使其变得懒惰。\.
-您没有在交替中包含.
。代码
import re
demostr = "Department of Microbiology and Immunology. Faculty of Tropical Medicine, Mahidol University, Electronic address: pornsawan.lea@mahidol.ac.th."
org = re.search(r"([A-Z][^\s,.]+[.]?\s[(]?)*(Dept|Association|Office|University|Department)[^,\d]*?(?=,|\.|\d)", demostr).group(0)
print(org)
答案 1 :(得分:0)
请尝试以下代码。
import re
demostr = "Department of Microbiology and Immunology. Faculty of Tropical Medicine, Mahidol University, Electronic address: pornsawan.lea@mahidol.ac.th."
org = re.search(r"([A-Z][^\s,.]+[.]?\s[(]?)*(Dept|Association|Office|University|Department)[^,\d]*?(?=,|\.|\d)", demostr).group(0)
print(org)
输出
Department of Microbiology and Immunology