打印名称作为名字和姓氏格式

时间:2018-08-02 01:00:06

标签: regex python-3.x spacy data-extraction

我有一个文本文件,其中包含以下数据:

Last name, First name in some of the cases

例如:

The patient was referred by Dr. Douglas, John, updated by: ‎Acosta, Christina
The patient was referred by Potter, Rob,M.D.
Sam was referred by Dr. Alisa Russo

我想要输出为:

John Douglas
Rob Potter
Alisa Russo

我将代码用作:

print(str(string.partition(',')[2].split()[0] +" "+string.partition(',')[0].split()[0]))

3 个答案:

答案 0 :(得分:1)

您可以首先找到名称,在名称前加上“博士”。或后跟“ M.D.”,然后在输出名称时,如果有逗号,请交换名称的顺序:

import re
data = '''The patient was referred by Dr. Douglas, John, updated by: ‎Acosta, Christina
The patient was referred by Potter, Rob,M.D.
Sam was referred by Dr. Alisa Russo'''
for name in re.findall(r"(?<=Dr. ){0}|{0}(?=,\s*M.D.)".format("[a-z'-]+,? [a-z'-]+"), data, re.IGNORECASE):
    print(' '.join(name.split(', ')[::-1]) if ', ' in name else name)

这将输出:

John Douglas
Rob Potter
Alisa Russo

答案 1 :(得分:1)

第一个挑战是捕获医生的名字和姓氏。这很困难,因为有些名字很毛。带有一些替换的正则表达式可以帮助您,例如

(?:Dr. )(\w+) (\w+)|(?:Dr. )(\w+), (\w+)|(\w+), (\w+),?(?: ?M\.?D\.?)

Demo

Code Sample

import re

regex = r"(?:Dr. )(\w+) (\w+)|(?:Dr. )(\w+), (\w+)|(\w+), (\w+),?(?: ?M\.?D\.?)"

test_str = ("The patient was referred by Dr. Douglas, John, updated by: ‎Acosta, Christina\n"
    "The patient was referred by Potter, Rob,M.D.\n"
    "Sam was referred by Dr. Alisa Russo")

matches = re.finditer(regex, test_str, re.MULTILINE)
results = []

for match in matches:
    if match.group(1):
        results.append([match.group(1), match.group(2)])
        next
    if match.group(3):
        results.append([match.group(4), match.group(3)])            
        next
    if match.group(5):
        results.append([match.group(6), match.group(5)])
        next

输出是列表列表。然后,打印变得非常容易。

[['John', 'Douglas'], ['Rob', 'Potter'], ['Alisa', 'Russo']]

答案 2 :(得分:0)

老实说,我首先要抓住名字。使用正则表达式...一旦得到,则根据','切换名字/姓氏。不要一次全部做完。