我有这些文字...
text1="% 4 Jérome Dekeyser + Corneille Wellens? "
text2="Matthew Sadler + Jon Speelman? 7 —"
text3="Martin Wostenholme + Frank Dancevic? “ere"
text4="7 4 Albert Lammens + Paul de Borman?"
text5="x Frans Gommers + Jeroen Simaeys?"
text6=" NSIe Darryl Johansen +George Xie? "
text7="Joseph Cludts + Herman \Verbauwen? "
我只想提取名称...以便获取。
Jérome Dekeyser + Corneille Wellens
Matthew Sadler + Jon Speelman
Martin Wostenholme + Frank Dancevic
Albert Lammens + Paul de Borman
Frans Gommers + Jeroen Simaeys
Darryl Johansen +George Xie
Joseph Cludts + Herman Verbauwen
+ 符号可以在输出中忽略。 这就是我可能想要的结果的方式...
Matthew Sadler ,Jon Speelman
答案 0 :(得分:1)
此正则表达式可能会得到完善,但适用于您所有的示例,包括Armin vanGrünwald。
import re as regex
text1="% 4 Jérome Dekeyser + Corneille Wellens? "
extract_names = regex.findall(r'\b(?!\d)\w+\s\w+\s\w+\b|\b(?!\d)\w+\s\w+\b', text1)
print (extract_names)
# outputs
['Jérome Dekeyser', 'Corneille Wellens']
print (', '.join(extract_names))
# outputs
Jérome Dekeyser, Corneille Wellens