当两列中有重复的单元格时,如何更改一列中的单元格的值

时间:2018-08-17 12:59:27

标签: python pandas

我有一个pandas数据框,其中包含各列的地址字段。我的问题是,在两列中,行中有重复的单元格值。有人知道当两列中有重复项时,如何有条件地更改一列的值吗?理想情况下,我想保留一个值,并将另一个值设置为np.nan

这是一个测试用例:

import pandas as pd

test = pd.read_json('{"housename":{"16":null,"17":null,"18":null},"name":{"16":"Shoecare","17":"33","18":"33A"},"house_number":{"16":"32","17":"33","18":"33A"},"street":{"16":"Carfax","17":"Carfax","18":"Carfax"},"city":{"16":"Horsham","17":"Horsham","18":"Horsham"},"postcode":{"16":"RH12 1EE","17":"RH12 1EE","18":"RH12 1EE"}}')

    city        house_number    housename   name        postcode    street
16  Horsham     32              NaN         Shoecare    RH12 1EE    Carfax
17  Horsham     33              NaN         33          RH12 1EE    Carfax
18  Horsham     33A             NaN         33A         RH12 1EE    Carfax

在测试用例上,我玩过test.duplicated(subset=['house_number', 'name']),但是它不会在house_numbername列中标识重复的值。

有人对如何首先识别两列中重复的单元格,然后将一个值设置为np.nan有任何建议吗?

所需的输出:

    housename   name      house_number  street  city     postcode
16  NaN         Shoecare  32            Carfax  Horsham  RH12 1EE
17  NaN         NaN       33            Carfax  Horsham  RH12 1EE
18  NaN         NaN       33A           Carfax  Horsham  RH12 1EE

1 个答案:

答案 0 :(得分:2)

如果2列分别为house_numbername,则可以按照以下方式进行操作:

test['name'] = np.where((test['house_number'] == test['name']), np.nan, test['name'])

输出:

       city house_number  housename      name  postcode  street
16  Horsham           32        NaN  Shoecare  RH12 1EE  Carfax
17  Horsham           33        NaN       NaN  RH12 1EE  Carfax
18  Horsham          33A        NaN       NaN  RH12 1EE  Carfax