我有非常奇怪的数据通过curl进入我的pandas数据帧。我想要做的是从列中提取值,如下所述。有人可以指导我如何提取信息吗?
cc = pd.read_csv(cc_curl)
print(cc['srv_id'])
srv_id
------
TicketID 14593_ServiceID 104731
ServiceID
TicketID 14595_ServiceID 104732
TicketID 14609_ServiceID 0
TicketID 0_ServiceID 178282
期望的输出
srv_id
------
14593 104731
14595 104732
14609
178282
答案 0 :(得分:2)
如果要将此信息提取到两个新列中,可以这样做:
import numpy as np
import pandas as pd
In [22]: df[['TicketID','ServiceID']] = (
...: df.srv_id.str.extract(r'TicketID\s+(\d+).*?ServiceID\s+(\d+)', expand=True)
...: .replace(r'\b0\b', np.nan, regex=True)
...: )
...:
In [23]: df
Out[23]:
srv_id TicketID ServiceID
0 TicketID 14593_ServiceID 104731 14593 104731
1 ServiceID NaN NaN
2 TicketID 14595_ServiceID 104732 14595 104732
3 TicketID 14609_ServiceID 0 14609 NaN
4 TicketID 0_ServiceID 178282 NaN 178282
如果您想用提取的数字替换字符串:
In [161]: df['new_srv_id'] = \
df.srv_id.replace([r'[^\d{5,}]+', r'\s*\b0\b\s*'], [' ', ''], regex=True)
In [162]: df
Out[162]:
srv_id new_srv_id
0 TicketID 14593_ServiceID 104731 14593 104731
1 ServiceID
2 TicketID 14595_ServiceID 104732 14595 104732
3 TicketID 14609_ServiceID 0 14609
4 TicketID 0_ServiceID 178282 178282