我有一个ID为字符串的pandas数据框df: 我正在尝试创建new_claim和new_description列
最近的一次,我发现是Efficiently replace part of value from one column with value from another column in pandas using regex?,但这使用了拆分部分,并且由于描述更改,所以我无法一概而论。
我可以一口气
date_reg = re.compile(r'\b'+df['old_id'][1]+r'\b')
df['new_claim'] = df['claim'].replace(to_replace=date_reg, value=df['external_id'], inplace=False)
但是如果我有
date_reg = re.compile(r'\b'+df['claim']+r'\b')
然后我得到“ TypeError:'Series'对象是可变的,因此不能被散列”
我采用的另一种方法
df['new_claim'] = df['claim']
for i in range(5):
old_id = df['old_id'][i]
new_id = df['external_id'][i]
df['new_claim'][i] = df['claim'][i].replace(to_replace=old_id,value=new_id)
给出TypeError:replace()不包含关键字参数
答案 0 :(得分:1)
仅使用方法pandas.replace():
df.old_id = df.old_id.fillna(0).astype('int')
list_old = list(map(str, df.old_id.tolist()))
list_new = list(map(str, df.external_id.tolist()))
df['new_claim'] = df.claim.replace(to_replace=['Claim ID: ' + e for e in list_old], value=['Claim ID: ' + e for e in list_new], regex=True)
df['new_description'] = df.description.replace(to_replace=['\* ' + e + '\\n' for e in list_old], value=['* ' + e + '\\n' for e in list_new], regex=True)
产生以下输出: