我有一个如下数据框:
index status
1 IPAMR-104.129.194.150-104.129.194.161;Clayment-STARR-65.115.39.42
2 Noti8nalMI-64.73.114.92-127.0.0.1
3 HSO_fm-dev-apps255-128.11.45.165
我想删除所有不需要的字符,并保留并输出,如下所示。我试过了
rs = df.replace(r'[^\d.;-]+','',regex=True)
,但没有运气
index status
1 104.129.194.150;104.129.194.161;65.115.39.42
2 64.73.114.92;127.0.0.1
3 128.11.45.165
答案 0 :(得分:3)
我们可以做findall
df.status=df.status.str.findall('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}').str.join(',')
Out[122]:
0 104.129.194.150,104.129.194.161,65.115.39.42
1 64.73.114.92,127.0.0.1
2 128.11.45.165
Name: status, dtype: object
答案 1 :(得分:3)
您可以先使用str.extractall
,然后再使用groupby
:
df['status'] = (df.status.str.extractall('(\d+\.\d+\.\d+\.\d+)')
[0].groupby(level=0).agg(';'.join)
)
输出:
index status
0 1 104.129.194.150;104.129.194.161;65.115.39.42
1 2 64.73.114.92;127.0.0.1
2 3 128.11.45.165