如何根据另一列减少部分数据帧colunm值

时间:2018-05-08 09:23:07

标签: python string pandas dataframe data-analysis

我有一个像这样的数据框。

我正在尝试删除子字符串列中的字符串。

Main                     substring
Sri playnig well cricket cricket
sri went out             NaN
Ram is in                NaN
Ram went to UK,US        UK,US

我的预期结果是,

Main                     substring
Sri playnig well         cricket
sri went out             NaN
Ram is in                NaN
Ram went to              UK,US

我尝试了df["Main"].str.reduce(df["substring"])但没有工作,请帮忙。

2 个答案:

答案 0 :(得分:2)

这个单行应该这样做:

df.loc[df['substring'].notnull(), 'Main'] = df.loc[df['substring'].notnull()].apply(lambda x: x['Main'].replace(x['substring'], ''), axis=1)

答案 1 :(得分:1)

这是使用pd.DataFrame.apply的一种方式。请注意,np.nan == np.nan的计算结果为False,我们可以在函数中使用此技巧来确定何时应用删除逻辑。

import pandas as pd, numpy as np

df = pd.DataFrame({'Main': ['Sri playnig well cricket', 'sri went out',
                            'Ram is in' ,'Ram went to UK,US'],
                   'substring': ['cricket', np.nan, np.nan, 'UK,US']})

def remover(row):
    sub = row['substring']
    if sub != sub:
        return row['Main']
    else:
        lst = row['Main'].split()
        return ' '.join([i for i in lst if i!=sub])

df['Main'] = df.apply(remover, axis=1)

print(df)

               Main substring
0  Sri playnig well   cricket
1      sri went out       NaN
2         Ram is in       NaN
3       Ram went to     UK,US