Question

例如：我有多个数据框。每个数据框都有列：variable_code，variable_description，year。

df1：

variable_code, variable_description 
N1, Number of returns     
N2, Number of Exemptions

df2：

variable_code, variable_description
N1,           Number of returns     
NUMDEP,         # of dependent

我想合并这两个数据帧，以获取df1和df2中的所有variable_code。

variable_code, variable_description
N1             Number of returns
N2             Number of Exemption
NUMDEP         # of dependent

Answer 1

有用于合并right here

的文档

由于要合并的列都称为“ variable_code”，因此可以使用on ='variable_code'

所以整个事情将是：

df1.merge(df2, on='variable_code')

如果您只想在其中一个表中有数据的地方想空白，则可以指定How ='outer'。如果只需要两个表中的数据（无空白），请使用how ='inner'。

Answer 2

首先，使用

连接df1，df2

 final_df = pd.concat([df1,df2]).

然后，我们可以将列variable_code，variable_name转换为字典。使用

作为变量的key_code，作为变量的name_value

 d = dict(zip(final_df['variable_code'], final_df['variable_name'])).

然后将d转换为数据帧：

 d_df = pd.DataFrame(list(d.items()), columns=['variable_code', 'variable_name']).

Answer 3

要满足您的要求，请尝试以下操作：

import pandas as pd

    #Create the first dataframe, through a dictionary - several other possibilities exist.
    data1 = {'variable_code': ['N1','N2'], 'variable_description': ['Number of returns','Number of Exemptions']}
    df1 = pd.DataFrame(data=data1)

    #Create second  dataframe
    data2 = {'variable_code': ['N1','NUMDEP'], 'variable_description': ['Number of returns','# of dependent']}
    df2 = pd.DataFrame(data=data2)

    #place the dataframes on a list. 
    dfs = [df1,df2] #additional dfs can be added here.
#You can loop over the list,merging the dfs. But here reduce and a lambda is used.
    resultant_df  = reduce(lambda  left,right: pd.merge(left,right,on=['variable_code','variable_description'],how='outer'), dfs)

这给出了：

>>> resultant_df
  variable_code  variable_description
0            N1     Number of returns
1            N2  Number of Exemptions
2        NUMDEP        # of dependent

how有多个选项，每个选项可满足各种需求。 outer（此处使用）甚至允许包含具有空数据的行。有关其他选项的详细说明，请参见docs。

熊猫合并多个数据框

3 个答案: