背景
这个问题是Alter text in pandas column based on names的变体。
我有以下df
,故意有各种问题
import pandas as pd
df = pd.DataFrame({'Text' : ['But now Smith,J J is Here from Smithsville',
'Maryland is HYDER,A MARY Found here ',
'hey here is Annual Doe,Jane Ann until ',
'The tuckered was Tucker,Tom is Not here but'],
'P_ID': [1,2,3,4],
'P_Name' : ['SMITH,J J', 'HYDER,A MARY', 'DOE,JANE ANN', 'TUCKER,TOM T'],
'N_ID' : ['A1', 'A2', 'A3', 'A4']
})
输出
N_ID P_ID P_Name Text
0 A1 1 SMITH,J J But now Smith,J J is Here from Smithsville
1 A2 2 HYDER,A MARY Maryland is HYDER,A MARY Found here
2 A3 3 DOE,JANE ANN hey here is Annual Doe,Jane Ann until
3 A4 4 TUCKER,TOM T The tuckered was Tucker,Tom is Not here but
目标
1)对于P_Name
中的名称,例如SMITH,J J
块名,在相应的**BLOCK**
列中带有Text
2)创建New_Text
列
所需的输出
N_ID P_ID P_Name Text New_Text
0 But now **BLOCK** is Here from Smithsville
1 Maryland is **BLOCK** Found here
2 hey here is Annual **BLOCK** until
3 The tuckered was **BLOCK** is Not here but
问题
如何实现所需的输出?
答案 0 :(得分:1)
这应该有效:
df['New_Text'] = df.apply(lambda x:x['Text'].lower().replace(x['P_Name'].lower(), '**BLOCK**'), axis=1)
您的示例存在一些空白问题,但应与正确构造的示例一起使用
0 but now BLOCK is here from smithsville
1 maryland is BLOCK found here
2 hey here is annual BLOCK until
3 the tuckered was tucker, tom is not here but
答案 1 :(得分:1)
如果要删除空格,请将replace
函数与regex=True
一起使用
# new data frame without the whitespace inconsistencies
df = pd.DataFrame({'Text' : ['But now Smith,J J is Here from Smithsville',
'Maryland is HYDER,A MARY Found here ',
'hey here is Annual Doe,Jane Ann until ',
'The tuckered was Tucker,Tom T is Not here but'],
'P_ID': [1,2,3,4],
'P_Name' : ['SMITH,J J', 'HYDER,A MARY', 'DOE,JANE ANN', 'TUCKER,TOM T'],
'N_ID' : ['A1', 'A2', 'A3', 'A4']
})
print(df.Text.str.lower().replace(df.P_Name.str.lower(), '**BLOCK**', regex=True))
0 but now **BLOCK** is here from smithsville
1 maryland is **BLOCK** found here
2 hey here is annual **BLOCK** until
3 the tuckered was **BLOCK** is not here but
Name: Text, dtype: object