Question

我有一个文本文件，其中包含以下数据：

Last name, First name in some of the cases

例如：

The patient was referred by Dr. Douglas, John, updated by: ‎Acosta, Christina
The patient was referred by Potter, Rob,M.D.
Sam was referred by Dr. Alisa Russo

我想要输出为：

John Douglas
Rob Potter
Alisa Russo

我将代码用作：

print(str(string.partition(',')[2].split()[0] +" "+string.partition(',')[0].split()[0]))

Answer 1

您可以首先找到名称，在名称前加上“博士”。或后跟“ M.D.”，然后在输出名称时，如果有逗号，请交换名称的顺序：

import re
data = '''The patient was referred by Dr. Douglas, John, updated by: ‎Acosta, Christina
The patient was referred by Potter, Rob,M.D.
Sam was referred by Dr. Alisa Russo'''
for name in re.findall(r"(?<=Dr. ){0}|{0}(?=,\s*M.D.)".format("[a-z'-]+,? [a-z'-]+"), data, re.IGNORECASE):
    print(' '.join(name.split(', ')[::-1]) if ', ' in name else name)

这将输出：

John Douglas
Rob Potter
Alisa Russo

Answer 2

第一个挑战是捕获医生的名字和姓氏。这很困难，因为有些名字很毛。带有一些替换的正则表达式可以帮助您，例如

(?:Dr. )(\w+) (\w+)|(?:Dr. )(\w+), (\w+)|(\w+), (\w+),?(?: ?M\.?D\.?)

Demo

Code Sample：

import re

regex = r"(?:Dr. )(\w+) (\w+)|(?:Dr. )(\w+), (\w+)|(\w+), (\w+),?(?: ?M\.?D\.?)"

test_str = ("The patient was referred by Dr. Douglas, John, updated by: ‎Acosta, Christina\n"
    "The patient was referred by Potter, Rob,M.D.\n"
    "Sam was referred by Dr. Alisa Russo")

matches = re.finditer(regex, test_str, re.MULTILINE)
results = []

for match in matches:
    if match.group(1):
        results.append([match.group(1), match.group(2)])
        next
    if match.group(3):
        results.append([match.group(4), match.group(3)])            
        next
    if match.group(5):
        results.append([match.group(6), match.group(5)])
        next

输出是列表列表。然后，打印变得非常容易。

[['John', 'Douglas'], ['Rob', 'Potter'], ['Alisa', 'Russo']]

Answer 3

老实说，我首先要抓住名字。使用正则表达式...一旦得到，则根据'，'切换名字/姓氏。不要一次全部做完。

打印名称作为名字和姓氏格式

3 个答案: