我正在尝试提取“消息文本”中的内容,特别是名称(在单词“ Admitted”之后)和卡号(在括号内),然后将结果放入新列。实现此目标的最佳方法是什么? 我尝试过
access_file['Name']=access_file['Message Text'].str.extract('(.*?)')
但结果列为空白。
谢谢
Message Type Server Date/Time Message Text Message Date/Time
0 Card Admitted 7/25/2018 8:10 Admitted 'Santos, Samuel' (Card: 203532) at '2nd Flr Check Rm 02-19' (IN). 7/25/2018 8:10
1 Card Admitted 7/25/2018 9:10 Admitted 'Zhu, Jin Chang' (Card: 203929) at '2nd Flr Check Rm 02-19' (IN). 7/25/2018 9:10
2 Card Admitted 7/25/2018 9:34 Admitted 'Zhu, Jin Chang' (Card: 203929) at '2nd Flr Check Rm 02-19' (IN). 7/25/2018 9:34
3 Card Admitted 7/25/2018 9:42 Admitted 'Klein, Erwin' (Card: 511268) at '2nd Flr Check Rm 02-19' (IN). 7/25/2018 9:41
4 Card Admitted 7/25/2018 10:29 Admitted 'Tesis, Olga' (Card: 203047) at '2nd Flr Check Rm 02-19' (IN). 7/25/2018 10:29
答案 0 :(得分:1)
此link可能会有所帮助。它解决了完全相同的问题。
关于要使用的正则表达式,您可以使用:
r".*Admitted\s+\'(?P<Name>[a-zA-Z, ]+)\' \(Card: (?P<digit>\d+)\).*"
谢谢。
this上的示例三说,您可以使用单个正则表达式来实现。这样会更加有用和干净。
答案 1 :(得分:0)
您可以尝试以下模式:
pattern = "Admitted\s+\'(?P<name>.*)\'.*\(Card\D*(?P<card_number>\d+)\)"
df['Message Text'].str.extract(pattern)
输出:
name card_number
0 Santos, Samuel 203532
1 Zhu, Jin Chang 203929
2 Zhu, Jin Chang 203929
3 Klein, Erwin 511268
4 Tesis, Olga 203047