给出以下两个数据框:
df1:
id city district year price
0 1 bjs cyq 2018 12
1 2 bjs cyq 2019 6
2 3 sh hp 2018 4
3 4 shs hpq 2019 3
df2:
id city district year
0 1 bj cy 2018
1 2 bj cy 2019
2 4 sh hp 2019
假设city
中district
和df1
中的某些值有误,所以我需要更新{{1中的city
和district
值}}和基于df1
的{{1}}的那些,我的预期结果是这样的:
df2
我该如何在熊猫中做到这一点?谢谢。
更新:
解决方案1:
id
解决方案2:
id city district year price
0 1 bj cy 2018 12
1 2 bj cy 2019 6
2 3 sh hp 2018 4
3 4 sh hp 2019 3
出局:
cities = df2.set_index('id')['city']
district = df2.set_index('id')['district']
df1['city'] = df1['id'].map(cities)
df1['district'] = df1['id'].map(district)
请注意,df1[["city","district"]] = pd.merge(df1,df2,on=["id"],how="left")[["city_y","district_y"]]
print(df1)
的{{1}}和 id city district year price
0 1 bj cy 2018 12
1 2 bj cy 2019 6
2 3 NaN NaN 2018 4
3 4 sh hp 2019 3
是city
,是district
,但是我想保留id
中的值。 / p>
答案 0 :(得分:3)
尝试combine_first
:
df2.set_index('id').combine_first(df1.set_index('id')).reset_index()
输出:
id city district price year
0 1 bj cy 12.0 2018.0
1 2 bj cy 6.0 2019.0
2 3 sh hp 4.0 2018.0
3 4 sh hp 3.0 2019.0
答案 1 :(得分:1)
尝试一下
df1[["city","district"]] = pd.merge(df1,df2,on=["id"],how="left")[["city_y","district_y"]]
答案 2 :(得分:1)
IIUC,我们可以使用.map
编辑-输入已更改。
target_cols = ['city','district']
df1.loc[df1['id'].isin(df2['id']),target_cols] = np.nan
cities = df2.set_index('id')['city']
district = df2.set_index('id')['district']
df1['city'] = df1['city'].fillna(df1['id'].map(cities))
df1['district'] = df1['district'].fillna(df1['id'].map(cities))
print(df1)
id city district year price
0 1 bj bj 2018 12
1 2 bj bj 2019 6
2 3 sh hp 2018 4
3 4 sh sh 2019 3