我有一个如下数据框:
root@Seli:/etc/bind# cat /etc/resolv.conf
search kmlt.lan
domain kmlt.lan
search dmz.kmlt.lan
domain dmz.kmlt.lan
nameserver 192.168.0.200
现在,我需要将“ ind_code”中的所有类似单词归为一组,并且我的DF应该如下所示:
所有带有“ pay”字样的“ ind_code”(无论大小写,在前,最后还是在中间)都应替换为“ Payment” 同样,所有带有“ rev”的单词都应替换为Rev
答案 0 :(得分:0)
如果您只想替换具有相似模式的单词,则可以使用此代码段
df.loc[df.ind_code.str.contains(r'[Pp][Aa][Yy]'),'ind_code']='Payment'
df.loc[df.ind_code.str.contains(r'[Rr][Ee][Vv]'),'ind_code']='Rev'
编辑
df.ind_code.str.extract(r'([PpRr][AaEe][YyVv])').str.lower().map({'pay':'Payment','rev':'Rev'})
出局:
group_code ind_code
0 111 Payment
1 111 Payment
2 111 Payment
3 111 Payment
4 111 Payment
5 111 Payment
6 111 Payment
7 222 Rev
8 222 Rev
9 222 Rev
10 222 Rev
11 222 Rev
12 222 Rev
答案 1 :(得分:0)
def replace_(row):
if 'pay' in row.lower():
return 'Payment'
if 'rev' in row.lower():
return 'Rev'
return row
df.ind_code = df.ind_code.apply(lambda row : replace_(row))
print(df)
output:
group_code ind_code
0 111 Payment
1 111 Payment
2 111 Payment
3 111 Payment
4 111 Payment
5 111 Payment
6 111 Payment
7 222 Rev
8 222 Rev
9 222 Rev
10 222 Rev
11 222 Rev
12 222 Rev
答案 2 :(得分:0)
您可以使用正则表达式来做到这一点!
import pandas as pd
import numpy as np
import re
df = pd.DataFrame({"group_code": ['111', '111', '111', '111', '111', '111',
'111','222','222','222','222','222','222'],
"ind_code": ['Credit pay', 'PAYMENT', 'loan payment', 'bill payment',
'pays', 'PayMent', 'Payer','Rev12',
'Rev11','13 rev','Rev13','Rev .!','REV 17']})
conditions = [df['ind_code'].str.contains('(pay)', case=False) , df['ind_code'].str.contains('(rev)', case=False)]
choices = ['pay', 'rev']
df['result'] = np.select(conditions, choices, default='unclear')
df
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.contains.html