正则表达式在python中提取组织名称

时间:2018-12-12 04:24:54

标签: python regex

示例程序

import re

demostr = "Department of Microbiology and Immunology. Faculty of Tropical Medicine, Mahidol University, Electronic address: pornsawan.lea@mahidol.ac.th."
org = re.search(r"([A-Z][^\s,.]+[.]?\s[(]?)*(Dept|Association|Office|University|Department)[^,\d]*(?=,|\d)", demostr).group()
print(org)   

输出

Department of Microbiology and Immunology. Faculty of Tropical Medicine

程序从给定的字符串中提取Organization,Department。如果,之后有Immunology,则效果很好。但是,如果在组织后出现点.,则会提取错误的输出。 所需的输出如下所示-

预期输出

Department of Microbiology and Immunology

2 个答案:

答案 0 :(得分:0)

您在正则表达式中有两点很好用

([A-Z][^\s,.]+[.]?\s[(]?)*(Dept|Association|Office|University|Department)[^,\d]*?(?=,|\.|\d)

您错过的事情

  • .*-本质上是贪婪的,由于您的需要,您需要使其变得懒惰。
  • \.-您没有在交替中包含.
  

代码

    import re

    demostr = "Department of Microbiology and Immunology. Faculty of Tropical Medicine, Mahidol University, Electronic address: pornsawan.lea@mahidol.ac.th."
    org = re.search(r"([A-Z][^\s,.]+[.]?\s[(]?)*(Dept|Association|Office|University|Department)[^,\d]*?(?=,|\.|\d)", demostr).group(0)
    print(org) 

Demo

答案 1 :(得分:0)

请尝试以下代码。

import re

demostr = "Department of Microbiology and Immunology. Faculty of Tropical Medicine, Mahidol University, Electronic address: pornsawan.lea@mahidol.ac.th."
org = re.search(r"([A-Z][^\s,.]+[.]?\s[(]?)*(Dept|Association|Office|University|Department)[^,\d]*?(?=,|\.|\d)", demostr).group(0)
print(org)  

输出

Department of Microbiology and Immunology