我一直在尝试正确使用pd.merge
函数,但是我收到错误消息或以我不喜欢的方式格式化了表格。我仔细阅读了文档,但找不到仅合并特定列的方法。例如,假设我正在使用这两个数据框。
df_1 = county_name accidents pedestrians
ADAMS 1 2
ALLEGHENY 1 3
ARMSTRONG 3 4
BEDFORD 1 1
df_2 = county_name population
ADAMS 102336
ALLEGHENY 1223048
ARMSTRONG 65642
BEDFORD 166140
BERKS 48480
BLAIR 417854
BRADFORD 123457
BUCKS 60853
CAMBRIA 628341
我正在寻找的结果是这样的。将县名添加到“ county_name”列中,但不重复,并且保留“ population”列。
df_outcome = county_name accidents pedestrians
ADAMS 1 2
ALLEGHENY 1 3
ARMSTRONG 3 4
BEDFORD 1 1
BERKS Nan Nan
BLAIR Nan Nan
BRADFORD Nan Nan
BUCKS Nan Nan
CAMBRIA Nan Nan
最后,我计划使用df_outcome.fillna(0)
将所有Nan
的值替换为零。
答案 0 :(得分:3)
过滤列county_name
,并在左联接中使用merge
:
df = df_2[['county_name']].merge(df_1, how='left')
print (df)
county_name accidents pedestrians
0 ADAMS 1.0 2.0
1 ALLEGHENY 1.0 3.0
2 ARMSTRONG 3.0 4.0
3 BEDFORD 1.0 1.0
4 BERKS NaN NaN
5 BLAIR NaN NaN
6 BRADFORD NaN NaN
7 BUCKS NaN NaN
8 CAMBRIA NaN NaN
答案 1 :(得分:1)
尝试:
Serializable