如何合并/连接两个不同长度的pandas数据帧?

时间:2018-04-21 13:31:39

标签: python python-2.7 pandas dataframe merge

我想连接/合并两个pandas数据帧,但我没有得到正确的结果。我有以下数据帧:

df1

    Username      | User_trim
-------------------------------
0   Maria M       | Maria
1   FakeName      | N/A
2   Achim B       | Achim
3   FlashMaster11 | N/A
4   Fakename2     | N/A
5   Gustav W      | Gustav


df2
    0        |1       | 2
---------------------------------
0   Maria M  | Maria  | female
2   Achim B  | Achim  | male
5   Gustav W | Gustav | male

我想有以下结果数据框:

    Username      | User_trim | Gender
---------------------------------
0   Maria M       | Maria     | female
1   FakeName      | N/A       | N/A
2   Achim B       | Achim     | male
3   FlashMaster11 | N/A       | N/A
4   Fakename2     | N/A       | N/A
5   Gustav W      | Gustav    | male

我尝试了以下代码

result = pd.concat([df1,df2], axis=1,ignore_index=True)

但是我得到了错误的结果,但表格的长度合适。所以我尝试了这个:

df1.merge(df2,how='outer', left_on='Username', right_on=0)

这段代码好像我得到了正确的结果,但是表格大于df1(我的意思是行)?

当我合并数据框并获取所有列时,我没有问题。我可以放弃它们。它只是将它们以不同的长度合并并将它们放在正确的行中的问题。

有没有人可以给我一个如何获得结果表的建议?

2 个答案:

答案 0 :(得分:1)

我认为merge中需要left join

df = df1.merge(df2,how='left', left_on='Username', right_on=0)
print (df)
        Username User_trim         0       1       2
0        Maria M     Maria   Maria M   Maria  female
1       FakeName       NaN       NaN     NaN     NaN
2        Achim B     Achim   Achim B   Achim    male
3  FlashMaster11       NaN       NaN     NaN     NaN
4      Fakename2       NaN       NaN     NaN     NaN
5       Gustav W    Gustav  Gustav W  Gustav    male

解决方案,如果需要在merge附加新列而不删除不必要的列,则首先rename至少有一列用于加入(这里Username DataFrame s)然后选择所有必要的列(始终连接列+所有其他新列):

df22 = df2.rename(columns={0:'Username', 2:'Gender'})[['Username', 'Gender']]
print (df22)
   Username  Gender
0   Maria M  female
1   Achim B    male
2  Gustav W    male

df = df1.merge(df22,how='left', on='Username')
print (df)
        Username User_trim  Gender
0        Maria M     Maria  female
1       FakeName       NaN     NaN
2        Achim B     Achim    male
3  FlashMaster11       NaN     NaN
4      Fakename2       NaN     NaN
5       Gustav W    Gustav    male

如果需要只添加一个新列,请使用map创建的Series set_index

df1['Gender'] = df1['Username'].map(df2.set_index(0)[2])
print (df1)
        Username User_trim  Gender
0        Maria M     Maria  female
1       FakeName       NaN     NaN
2        Achim B     Achim    male
3  FlashMaster11       NaN     NaN
4      Fakename2       NaN     NaN
5       Gustav W    Gustav    male

答案 1 :(得分:0)

由于您的索引已经对齐,因此您可以对齐列名称,然后使用pd.DataFrame.combine_first

df2 = df2.rename(columns={0: 'Username', 1: 'User_trim', 2: 'Gender'})

res = df1.combine_first(df2)

print(res)

#    Gender User_trim       Username
# 0  female     Maria        Maria M
# 1     NaN       N/A       FakeName
# 2    male     Achim        Achim B
# 3     NaN       N/A  FlashMaster11
# 4     NaN       N/A      Fakename2
# 5    male    Gustav       Gustav W