在python中将多个数据帧合并为一个数据帧

时间:2019-11-07 07:24:59

标签: python pandas dataframe merge

我有以下4个数据帧

df = pd.DataFrame({_id:[1,2,3,4], name:[Charan, Kumar, Nikhil, Kumar], })

df1 = pd.DataFrame({_id:[1,3,4], count_of_apple:[5,3,1]})


df2 = pd.DataFrame({_id:[1,2,3], count_of_organge:[8,4,6]})


df3 = pd.DataFrame({_id:[2,3,4], count_of_lime:[7,9,2]})

我想将所有数据帧合并到一个称为最终

的单个数据帧中

我尝试过使用PD.merge,但是它的问题是我必须在3次不同的时间里做,有没有更简单的方法呢?

我使用以下代码获取结果

final = pd.merge(df, df1, on='_id', how='left')


final = pd.merge(final, df2, on='_id', how='left')


final = pd.merge(final, df3, on='_id', how='left')

我希望最终结果是这样

final.head()

_id |名称|橙色数|苹果数|石灰的数量

1 |查兰| 5 | 8 |钠

2 |库玛|娜| 4 | 7

3 | Nikhil | 3 | 6 | 9

4 |库玛| 1 |娜| 2

2 个答案:

答案 0 :(得分:1)

您可以使用concat,但首先需要将_id转换为DataFrame.set_index为每个DataFrame编制索引:

dfs = [df, df1, df2, df3]

df = pd.concat([x.set_index('_id') for x in dfs], axis=1).reset_index()

是什么意思?

df = df.set_index('_id')
df1 = df1.set_index('_id')
df2 = df2.set_index('_id')
df3 = df3.set_index('_id')

df = pd.concat([df, df1, df2, df3], axis=1).reset_index()

print (df)
   _id    name  count_of_apple  count_of_organge  count_of_lime
0    1  Charan             5.0               8.0            NaN
1    2   Kumar             NaN               4.0            7.0
2    3  Nikhil             3.0               6.0            9.0
3    4   Kumar             1.0               NaN            2.0

答案 1 :(得分:0)

来自文档https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

In [1]: df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
   ...:                     'B': ['B0', 'B1', 'B2', 'B3'],
   ...:                     'C': ['C0', 'C1', 'C2', 'C3'],
   ...:                     'D': ['D0', 'D1', 'D2', 'D3']},
   ...:                    index=[0, 1, 2, 3])
   ...:

In [8]: df4 = pd.DataFrame({'B': ['B2', 'B3', 'B6', 'B7'],
   ...:                     'D': ['D2', 'D3', 'D6', 'D7'],
   ...:                     'F': ['F2', 'F3', 'F6', 'F7']},
   ...:                    index=[2, 3, 6, 7])
   ...: 

In [9]: result = pd.concat([df1, df4], axis=1, sort=False)

输出: enter image description here