所以,我有两个df,我想通过它们共有的两列来合并它们。
df看起来像这样:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'YEAR': [2016,2016,2016,2015,2015,2015,1990,1990,1990], 'COUNTRY': ['Brazil', 'Albania', 'Chile','Brazil', 'Albania', 'Chile','Brazil', 'Albania', 'Chile'],'SCORE_1': [1.1, 2.1, 3.1, 1.2, 2.2, 3.2, 1.3, 2.3, 3.3], 'VARIABLE_A': [40,50,60,45,55,65,20,30,35], 'VARIABLE_B': [110,210,310,120,220,320,130,230,330]})
df2 = pd.DataFrame({'YEAR': [1990,1990,1990,2014,2014,2014,2015,2015,2015,2017,2017], 'COUNTRY': ['Australia', 'Brazil', 'Chile','Australia', 'Brazil', 'Chile','Australia', 'Brazil', 'Chile','Australia', 'Brazil'],
'SCORE_2': [1001,1002,1003,2001,2002,2003,3001,3002,3003,4001,4002]})
print(df1)
print(df2)
我想要这样的合并:
df_final = pd.DataFrame({'YEAR': [2017,2017,2016,2016,2016,2015,2015,2015,1990,1990,1990], 'COUNTRY': ['Australia','Brazil','Albania','Brazil','Chile','Albania','Brazil','Chile','Albania','Brazil','Chile'],'SCORE_1': [np.nan, np.nan, 1.1, 2.1, 3.1, 1.2, 2.2, 3.2, 1.3, 2.3, 3.3],'VARIABLE_A': [np.nan,np.nan,40,50,60,45,55,65,20,30,35], 'VARIABLE_B': [np.nan,np.nan,110,210,310,120,220,320,130,230,330],
'SCORE_2': [4001,4002, np.nan,np.nan,np.nan,np.nan,3002,3003,np.nan,1002,1003]})
print(df_final)
我尝试了pd.merge(df1,df2,how ='outer',on = ['COUNTRY','YEAR']),但是它不起作用。有人可以帮我吗? 我还是一个初学者。顺便说一句,这是我的第一个问题。