例如:我有多个数据框。每个数据框都有列:variable_code,variable_description,year。
df1:
variable_code, variable_description
N1, Number of returns
N2, Number of Exemptions
df2:
variable_code, variable_description
N1, Number of returns
NUMDEP, # of dependent
我想合并这两个数据帧,以获取df1和df2中的所有variable_code。
variable_code, variable_description
N1 Number of returns
N2 Number of Exemption
NUMDEP # of dependent
答案 0 :(得分:0)
有用于合并right here
的文档由于要合并的列都称为“ variable_code”,因此可以使用on ='variable_code'
所以整个事情将是:
df1.merge(df2, on='variable_code')
如果您只想在其中一个表中有数据的地方想空白,则可以指定How ='outer'。如果只需要两个表中的数据(无空白),请使用how ='inner'。
答案 1 :(得分:0)
首先,使用
连接df1,df2 final_df = pd.concat([df1,df2]).
然后,我们可以将列variable_code,variable_name转换为字典。使用
作为变量的key_code,作为变量的name_value d = dict(zip(final_df['variable_code'], final_df['variable_name'])).
然后将d转换为数据帧:
d_df = pd.DataFrame(list(d.items()), columns=['variable_code', 'variable_name']).
答案 2 :(得分:0)
要满足您的要求,请尝试以下操作:
import pandas as pd
#Create the first dataframe, through a dictionary - several other possibilities exist.
data1 = {'variable_code': ['N1','N2'], 'variable_description': ['Number of returns','Number of Exemptions']}
df1 = pd.DataFrame(data=data1)
#Create second dataframe
data2 = {'variable_code': ['N1','NUMDEP'], 'variable_description': ['Number of returns','# of dependent']}
df2 = pd.DataFrame(data=data2)
#place the dataframes on a list.
dfs = [df1,df2] #additional dfs can be added here.
#You can loop over the list,merging the dfs. But here reduce and a lambda is used.
resultant_df = reduce(lambda left,right: pd.merge(left,right,on=['variable_code','variable_description'],how='outer'), dfs)
这给出了:
>>> resultant_df
variable_code variable_description
0 N1 Number of returns
1 N2 Number of Exemptions
2 NUMDEP # of dependent
how
有多个选项,每个选项可满足各种需求。 outer
(此处使用)甚至允许包含具有空数据的行。有关其他选项的详细说明,请参见docs。