根据名称更改“熊猫”列中的文本

时间:2019-07-14 17:12:47

标签: regex python-3.x pandas text nlp

背景

我有以下示例df

import pandas as pd
df = pd.DataFrame({'Text' : ['Jon J Smith is Here from **PHI** until **PHI**', 
                                   'No P_Name Found here', 
                                   'Jane Ann Doe is Also here until **PHI** ',
                                '**PHI** was **PHI** Tom Tucker is Not here but **PHI** '], 

                      'P_ID': [1,2,3,4], 
                      'P_Name' : ['Smith, Jon J', 'Rider, Mary', 'Doe, Jane Ann', 'Tucker, Tom'],
                      'N_ID' : ['A1', 'A2', 'A3', 'A4']

                     })

#rearrange columns
df = df[['Text','N_ID', 'P_ID', 'P_Name']]
df


                         Text                       N_ID    P_ID    P_Name
0   Jon J Smith is Here from **PHI** until **PHI**  A1        1 Smith, Jon J
1   No P_Name Found here                            A2        2 Rider, Mary
2   Jane Ann Doe is Also here until **PHI**         A3        3 Doe, Jane Ann
3   **PHI** was **PHI** Tom Tucker is Not here but  A4        4 Tucker, Tom

目标

1)在Text列中,将**PHI**添加到与Jon J Smith中找到的值相对应的值(例如P_Name

所需的输出

                         Text                       N_ID    P_ID    P_Name
0   **PHI** is Here from **PHI** until **PHI**      A1        1 Smith, Jon J
1   No P_Name Found here                            A2        2 Rider, Mary
2   **PHI** is Also here until **PHI**              A3        3 Doe, Jane Ann
3   **PHI** was **PHI** **PHI** is Not here but     A4        4 Tucker, Tom

所需的输出可以出现在同一Text列中,也可以生成new_col

问题

如何实现所需的输出?

1 个答案:

答案 0 :(得分:2)

一种方法:

>>> df['Text'].replace(df['P_Name'].str.split(', *').apply(lambda l: ' '.join(l[::-1])),'**PHI**',regex=True)
0           **PHI** is here from **PHI** until **PHI**
1                                 No P_Name found here
2                  **PHI** is also here until **PHI**
3    **PHI** was **PHI** **PHI** is not here but **...

您可以使用replace=True来执行此操作,或者使用上面的df['new_col']=创建一个新列。这样做是将P_name列拆分,以空格将其反向连接,然后将其替换为Text列。