根据熊猫中的一个公共列更新来自另一个数据框的多个列

时间:2020-04-26 03:39:42

标签: python-3.x pandas dataframe

给出以下两个数据框:

df1:

   id city district  year  price
0   1  bjs      cyq  2018     12
1   2  bjs      cyq  2019      6
2   3   sh       hp  2018      4
3   4  shs      hpq  2019      3

df2:

   id city district  year
0   1   bj       cy  2018
1   2   bj       cy  2019
2   4   sh       hp  2019

假设citydistrictdf1中的某些值有误,所以我需要更新{{1中的citydistrict值}}和基于df1的{​​{1}}的那些,我的预期结果是这样的:

df2

我该如何在熊猫中做到这一点?谢谢。

更新

解决方案1:

id

解决方案2:

   id city district  year  price
0   1   bj       cy  2018     12
1   2   bj       cy  2019      6
2   3   sh       hp  2018      4
3   4   sh       hp  2019      3

出局:

cities = df2.set_index('id')['city']
district = df2.set_index('id')['district']

df1['city'] = df1['id'].map(cities)
df1['district'] = df1['id'].map(district)

请注意,df1[["city","district"]] = pd.merge(df1,df2,on=["id"],how="left")[["city_y","district_y"]] print(df1) 的{​​{1}}和 id city district year price 0 1 bj cy 2018 12 1 2 bj cy 2019 6 2 3 NaN NaN 2018 4 3 4 sh hp 2019 3 city,是district,但是我想保留id中的值。 / p>

3 个答案:

答案 0 :(得分:3)

尝试combine_first

df2.set_index('id').combine_first(df1.set_index('id')).reset_index()

输出:

   id city district  price    year
0   1   bj       cy   12.0  2018.0
1   2   bj       cy    6.0  2019.0
2   3   sh       hp    4.0  2018.0
3   4   sh       hp    3.0  2019.0

答案 1 :(得分:1)

尝试一下

df1[["city","district"]] = pd.merge(df1,df2,on=["id"],how="left")[["city_y","district_y"]]

答案 2 :(得分:1)

IIUC,我们可以使用.map

编辑-输入已更改。

target_cols = ['city','district']

df1.loc[df1['id'].isin(df2['id']),target_cols] = np.nan

cities = df2.set_index('id')['city']
district = df2.set_index('id')['district']

df1['city'] = df1['city'].fillna(df1['id'].map(cities))
df1['district'] = df1['district'].fillna(df1['id'].map(cities))


print(df1)

   id city district  year  price
0   1   bj       bj  2018     12
1   2   bj       bj  2019      6
2   3   sh       hp  2018      4
3   4   sh       sh  2019      3