合并数据框并将列也合并为一列

时间:2021-04-28 15:13:31

标签: python pandas dataframe merge duplicates

我有一个数据框 df1

  index A   B   C   D   E
0   0   92  84          
1   1   98  49          
2   2   49  68          
3   3   0   58          
4   4   91  95          
5   5   47  56  52  25  58
6   6   86  71  34  39  40
7   7   80  78  0   86  12
8   8   0   8   30  88  42
9   9   69  83  7   65  60
10  10  93  39  10  90  45

还有这个数据框 df2

  index C   D   E   F
0   0   27  95  51  45
1   1   99  33  92  67
2   2   68  37  29  65
3   3   99  25  48  40
4   4   33  74  55  66
5   13  65  76  19  62

我希望在合并 df1df2 时得到以下结果

index   A   B   C   D   E   F
0   0   92  84  27  95  51  45
1   1   98  49  99  33  92  67
2   2   49  68  68  37  29  65
3   3   0   58  99  25  48  40
4   4   91  95  33  74  55  66
5   5   47  56  52  25  58              
6   6   86  71  34  39  40              
7   7   80  78  0   86  12              
8   8   0   8   30  88  42              
9   9   69  83  7   65  60              
10  10  93  39  10  90  45              
11  13          65  76  19  62

但是,我在使用 pd 时不断得到这个。合并(),

df_total=df1.merge(df2,how="outer",on="index",suffixes=(None,"_"))
df_total.replace(to_replace=np.nan,value=" ", inplace=True)
df_total

index   A   B   C   D   E   C_  D_  E_  F
0   0   92  84              27  95  51  45
1   1   98  49              99  33  92  67
2   2   49  68              68  37  29  65
3   3   0   58              99  25  48  40
4   4   91  95              33  74  55  66
5   5   47  56  52  25  58              
6   6   86  71  34  39  40              
7   7   80  78  0   86  12              
8   8   0   8   30  88  42              
9   9   69  83  7   65  60              
10  10  93  39  10  90  45              
11  13                      65  76  19  62

有没有办法使用 pd.merge 或类似函数获得理想的结果?

谢谢

1 个答案:

答案 0 :(得分:1)

您可以使用.combine_first()

# convert the empty cells ("") to NaNs
df1 = df1.replace("", np.nan)
df2 = df2.replace("", np.nan)

# set indices and combine the dataframes
df1 = df1.set_index("index")
print(df1.combine_first(df2.set_index("index")).reset_index().fillna(""))

打印:

    index     A     B     C     D     E     F
0       0  92.0  84.0  27.0  95.0  51.0  45.0
1       1  98.0  49.0  99.0  33.0  92.0  67.0
2       2  49.0  68.0  68.0  37.0  29.0  65.0
3       3   0.0  58.0  99.0  25.0  48.0  40.0
4       4  91.0  95.0  33.0  74.0  55.0  66.0
5       5  47.0  56.0  52.0  25.0  58.0      
6       6  86.0  71.0  34.0  39.0  40.0      
7       7  80.0  78.0   0.0  86.0  12.0      
8       8   0.0   8.0  30.0  88.0  42.0      
9       9  69.0  83.0   7.0  65.0  60.0      
10     10  93.0  39.0  10.0  90.0  45.0      
11     13              65.0  76.0  19.0  62.0