背景
下面的代码与skipping empty list and continuing with function
略有修改import pandas as pd
Names = [list(['Jon', 'Smith', 'jon', 'John']),
list([]),
list(['Bob', 'bobby', 'Bobs']),
list([]),
list([])]
df = pd.DataFrame({'Text' : ['Jon J Smith is Here and jon John from ',
'get nothing from here',
'I like Bob and bobby and also Bobs diner ',
'nothing here too',
'same here'
],
'P_ID': [1,2,3, 4,5],
'P_Name' : Names
})
#rearrange columns
df = df[['Text', 'P_ID', 'P_Name']]
df
Text P_ID P_Name
0 Jon J Smith is Here and jon John from 1 [Jon, Smith, jon, John]
1 get nothing from here 2 []
2 I like Bob and bobby and also Bobs diner 3 [Bob, bobby, Bobs]
3 nothing here too 4 []
4 same here 5 []
工作代码
下面的代码摘自skipping empty list and continuing with function
m = df['P_Name'].str.len().ne(0)
df.loc[m, 'New'] = df.loc[m, 'Text'].replace(df.loc[m].P_Name,'**PHI**',regex=True)
并在New
中产生以下df
列
Text P_ID P_Name New
0 **PHI** J **PHI** is Here and **PHI** **PHI** ...
1 NaN
2 I like **PHI** and **PHI** and also **PHI**s d..
3 NaN
4 NaN
所需的输出
但是,我想保留原始文本,例如,不保留行NaN
,1
,3
中的4
。 get nothing from here
,如下所示
Text P_ID P_Name New
0 **PHI** J **PHI** is Here and **PHI** **PHI** ...
1 get nothing from here
2 I like **PHI** and **PHI** and also **PHI**s d..
3 nothing here too
4 same here
问题
如何调整下面的代码以实现所需的输出?
m = df['P_Name'].str.len().ne(0)
df.loc[m, 'New'] = df.loc[m, 'Text'].replace(df.loc[m].P_Name,'**PHI**',regex=True)
答案 0 :(得分:2)
只需在fillna
的末尾添加此行
['#*OQL[C++]: Extending C++ with an Object Query Capability. #@José A. Blakeley #t1995 #cModern Database Systems #index0',
'#*Transaction Management in Multidatabase Systems. #@Yuri Breitbart,Hector Garcia-Molina,Abraham Silberschatz #t1995 #cModern Database Systems #index1']
答案 1 :(得分:1)
@tawab_shakeel关闭。只需添加:
df['New'].fillna(df['Text'], inplace=True)
fillna
将从df['Text']
中捕获正确的值。
我还可以使用re模块的正则表达式提出替代解决方案。
def replacing(x):
if len(x['P_Name']) > 0:
return re.sub('|'.join(x['P_Name']), '**PHI**', x['Text'])
else:
return x['Text']
df['New'] = df.apply(replacing, axis=1)
apply
方法将replacing
函数应用于每一行,并且替换由re.sub函数完成。