我有一个文本文件,其中包含以下数据:
Last name, First name in some of the cases
例如:
The patient was referred by Dr. Douglas, John, updated by: Acosta, Christina
The patient was referred by Potter, Rob,M.D.
Sam was referred by Dr. Alisa Russo
我想要输出为:
John Douglas
Rob Potter
Alisa Russo
我将代码用作:
print(str(string.partition(',')[2].split()[0] +" "+string.partition(',')[0].split()[0]))
答案 0 :(得分:1)
您可以首先找到名称,在名称前加上“博士”。或后跟“ M.D.”,然后在输出名称时,如果有逗号,请交换名称的顺序:
import re
data = '''The patient was referred by Dr. Douglas, John, updated by: Acosta, Christina
The patient was referred by Potter, Rob,M.D.
Sam was referred by Dr. Alisa Russo'''
for name in re.findall(r"(?<=Dr. ){0}|{0}(?=,\s*M.D.)".format("[a-z'-]+,? [a-z'-]+"), data, re.IGNORECASE):
print(' '.join(name.split(', ')[::-1]) if ', ' in name else name)
这将输出:
John Douglas
Rob Potter
Alisa Russo
答案 1 :(得分:1)
第一个挑战是捕获医生的名字和姓氏。这很困难,因为有些名字很毛。带有一些替换的正则表达式可以帮助您,例如
(?:Dr. )(\w+) (\w+)|(?:Dr. )(\w+), (\w+)|(\w+), (\w+),?(?: ?M\.?D\.?)
import re
regex = r"(?:Dr. )(\w+) (\w+)|(?:Dr. )(\w+), (\w+)|(\w+), (\w+),?(?: ?M\.?D\.?)"
test_str = ("The patient was referred by Dr. Douglas, John, updated by: Acosta, Christina\n"
"The patient was referred by Potter, Rob,M.D.\n"
"Sam was referred by Dr. Alisa Russo")
matches = re.finditer(regex, test_str, re.MULTILINE)
results = []
for match in matches:
if match.group(1):
results.append([match.group(1), match.group(2)])
next
if match.group(3):
results.append([match.group(4), match.group(3)])
next
if match.group(5):
results.append([match.group(6), match.group(5)])
next
输出是列表列表。然后,打印变得非常容易。
[['John', 'Douglas'], ['Rob', 'Potter'], ['Alisa', 'Russo']]
答案 2 :(得分:0)
老实说,我首先要抓住名字。使用正则表达式...一旦得到,则根据','切换名字/姓氏。不要一次全部做完。