使用正则表达式替换na来查找另一列中的列值

时间:2016-03-03 03:03:19

标签: python regex pandas

我有一个示例pandas数据框如下:

df = pd.DataFrame({
'notes': pd.Series(['meth cook makes meth with purity of over 96%', 'meth cook is also called Heisenberg', 'meth cook has cancer', 'he is known as the best meth cook', 'Meth Dealer added chili powder to his batch', 'Meth Dealer learned to make the best meth', 'everyone goes to this Meth Dealer for best shot', 'girlfriend of the meth dealer died', 'this lawyer is a people pleasing person', 'cinnabon has now hired the lawyer as a baker', 'lawyer had to take off in the end', 'lawyer has a lot of connections who knows other guy']), 
'name': pd.Series([np.nan, 'Walter White', np.nan, np.nan, np.nan, np.nan, 'Jessie Pinkman', np.nan, 'Saul Goodman', np.nan, np.nan, np.nan]), 
'occupation': pd.Series(['meth cook', np.nan, np.nan, np.nan, np.nan, np.nan, 'meth dealer', np.nan, np.nan, 'lawyer', np.nan, np.nan])
})


name                                    notes                                       occupation
NaN                     meth cook makes meth with purity of over 96%                meth cook   
Walter White            meth cook is also called Heisenberg                             NaN
NaN                     meth cook has cancer                                            NaN
NaN                     he is known as the best meth cook                               NaN
NaN                     Meth Dealer added chili powder to his batch                     NaN
NaN                     Meth Dealer learned to make the best meth                       NaN
Jessie Pinkman          everyone goes to this Meth Dealer for best shot             meth dealer
NaN                     girlfriend of the meth dealer died                              NaN
Saul Goodman            this lawyer is a people pleasing person                         NaN
NaN                     cinnabon has now hired the lawyer as a baker                  lawyer
NaN                     lawyer had to take off in the end                               NaN
NaN                     lawyer has a lot of connections who knows other guy             NaN

所以,我们总共有三个职业:

pd.unique(df.occupation)

array(['meth cook', 'meth dealer', 'lawyer'], dtype=object)

我想在'notes'列中查找'占用'值,如果占用中已存在值,则用匹配的占用替换该行的任何缺失值。 例如:在第二行,缺少职业。但是,如果我们查找('meth cook','meth dealer','lawyer')的'notes'列,我们会看到第二行的'notes'栏中存在'meth cook'。所以,缺少的职业应该填写'meth cook'

我试过了:

df.occupation[df.occupation.notnull()].apply(lambda x: df.occupation.str.extract('('+x+')'))

然而,它并没有给我我想要的结果。我希望看到如下结果:

name                                    notes                                       occupation
NaN                     meth cook makes meth with purity of over 96%                meth cook   
Walter White            meth cook is also called Heisenberg                         meth cook
NaN                     meth cook has cancer                                        meth cook
NaN                     he is known as the best meth cook                           meth cook
NaN                     Meth Dealer added chili powder to his batch                 meth dealer
NaN                     Meth Dealer learned to make the best meth                   meth dealer
Jessie Pinkman          everyone goes to this Meth Dealer for best shot             meth dealer
NaN                     girlfriend of the meth dealer died                          meth dealer
Saul Goodman            this lawyer is a people pleasing person                       lawyer
NaN                     cinnabon has now hired the lawyer as a baker                  lawyer
NaN                     lawyer had to take off in the end                             lawyer
NaN                     lawyer has a lot of connections who knows other guy           lawyer

有人可以提供任何意见吗?

1 个答案:

答案 0 :(得分:1)

您可以使用occupation填充notes中使用str.contains的{​​{1}}中的缺失值来对for循环执行此操作:

occ = pd.unique(df.occupation[df.occupation.notnull()])

for pa in occ:
    subset = df.notes.str.contains(pa, case=False)
    df.occupation[subset] = df.occupation[subset].fillna(pa)


In [40]: df
Out[40]:
              name                                              notes    occupation
0              NaN       meth cook makes meth with purity of over 96%     meth cook
1     Walter White                meth cook is also called Heisenberg     meth cook
2              NaN                               meth cook has cancer     meth cook
3              NaN                  he is known as the best meth cook     meth cook
4              NaN        Meth Dealer added chili powder to his batch   meth dealer
5              NaN          Meth Dealer learned to make the best meth   meth dealer
6   Jessie Pinkman    everyone goes to this Meth Dealer for best shot   meth dealer
7              NaN                 girlfriend of the meth dealer died   meth dealer
8     Saul Goodman            this lawyer is a people pleasing person        lawyer
9              NaN       cinnabon has now hired the lawyer as a baker        lawyer
10             NaN                  lawyer had to take off in the end        lawyer
11             NaN  lawyer has a lot of connections who knows othe...        lawyer