我想连接/合并两个pandas数据帧,但我没有得到正确的结果。我有以下数据帧:
df1
Username | User_trim
-------------------------------
0 Maria M | Maria
1 FakeName | N/A
2 Achim B | Achim
3 FlashMaster11 | N/A
4 Fakename2 | N/A
5 Gustav W | Gustav
df2
0 |1 | 2
---------------------------------
0 Maria M | Maria | female
2 Achim B | Achim | male
5 Gustav W | Gustav | male
我想有以下结果数据框:
Username | User_trim | Gender
---------------------------------
0 Maria M | Maria | female
1 FakeName | N/A | N/A
2 Achim B | Achim | male
3 FlashMaster11 | N/A | N/A
4 Fakename2 | N/A | N/A
5 Gustav W | Gustav | male
我尝试了以下代码
result = pd.concat([df1,df2], axis=1,ignore_index=True)
但是我得到了错误的结果,但表格的长度合适。所以我尝试了这个:
df1.merge(df2,how='outer', left_on='Username', right_on=0)
这段代码好像我得到了正确的结果,但是表格大于df1(我的意思是行)?
当我合并数据框并获取所有列时,我没有问题。我可以放弃它们。它只是将它们以不同的长度合并并将它们放在正确的行中的问题。
有没有人可以给我一个如何获得结果表的建议?
答案 0 :(得分:1)
我认为merge
中需要left join
:
df = df1.merge(df2,how='left', left_on='Username', right_on=0)
print (df)
Username User_trim 0 1 2
0 Maria M Maria Maria M Maria female
1 FakeName NaN NaN NaN NaN
2 Achim B Achim Achim B Achim male
3 FlashMaster11 NaN NaN NaN NaN
4 Fakename2 NaN NaN NaN NaN
5 Gustav W Gustav Gustav W Gustav male
解决方案,如果需要在merge
附加新列而不删除不必要的列,则首先rename
至少有一列用于加入(这里Username
DataFrame
s)然后选择所有必要的列(始终连接列+所有其他新列):
df22 = df2.rename(columns={0:'Username', 2:'Gender'})[['Username', 'Gender']]
print (df22)
Username Gender
0 Maria M female
1 Achim B male
2 Gustav W male
df = df1.merge(df22,how='left', on='Username')
print (df)
Username User_trim Gender
0 Maria M Maria female
1 FakeName NaN NaN
2 Achim B Achim male
3 FlashMaster11 NaN NaN
4 Fakename2 NaN NaN
5 Gustav W Gustav male
如果需要只添加一个新列,请使用map
创建的Series
set_index
:
df1['Gender'] = df1['Username'].map(df2.set_index(0)[2])
print (df1)
Username User_trim Gender
0 Maria M Maria female
1 FakeName NaN NaN
2 Achim B Achim male
3 FlashMaster11 NaN NaN
4 Fakename2 NaN NaN
5 Gustav W Gustav male
答案 1 :(得分:0)
由于您的索引已经对齐,因此您可以对齐列名称,然后使用pd.DataFrame.combine_first
:
df2 = df2.rename(columns={0: 'Username', 1: 'User_trim', 2: 'Gender'})
res = df1.combine_first(df2)
print(res)
# Gender User_trim Username
# 0 female Maria Maria M
# 1 NaN N/A FakeName
# 2 male Achim Achim B
# 3 NaN N/A FlashMaster11
# 4 NaN N/A Fakename2
# 5 male Gustav Gustav W