Question

我有两个数据框，df1和df2，我试图找出一种生成df3的方法，如截图中所示：

所以，这里的目标是保留df1的所有行并在其下添加df2行。但是，我希望有一行用于匹配Name，Lat和Lon。因此，Name，Lat和Lon将用作键。

还有ZIP列的问题。对于已连接的行，我想保留df1的ZIP值。

我试过了：

df3=pandas.merge(df1,df2,on=['Name','Lat','Lon'],how='outer')

这产生了一些接近我想要的东西：

如您所见，上面的数据框出现了两个不同的ZIP和地址列。

关于如何获得干净的df3数据帧的想法？

Answer 1

我不认为＆＃39;合并＆＃39;适用于此任务（即，在右DF上连接左DF），因为您实际上将一个DF置于另一个之上，然后删除重复项。所以你可以试试像：

#put one DF 'on top' of the other (like-named columns should drop into place)
df3 = pandas.concat([df1, df2])
#get rid of any duplicates
df3.drop_duplicates(inplace = True)

修改

根据您的反馈，我意识到需要一些更脏的解决方案。您将使用合并，然后从重复列填充NaN。像
这样的东西
df1 = pd.DataFrame({'test':[1,2,3,6,np.nan, np.nan]}) df2 = pd.DataFrame({'test':[np.nan,np.nan,3,6,10,24]}) #some merge statement to get them into together into the var 'df' df = pd.merge(df1, df2, left_index = True, right_index=True) #collect the _x columns original_cols = [x for x in df.columns if x.endswith('_x')] for col in original_cols: #use the duplicate column to fill the NaN's of the original column duplicate = col.replace('_x', '_y') df[col].fillna(df[duplicate], inplace = True) #drop the duplicate df.drop(duplicate, axis = 1, inplace = True) #rename the original to remove the '_x' df.rename(columns = {col:col.replace('_x', '')}, inplace = True)

让我知道这是否有效。

连接数据帧行并在键相同时匹配

1 个答案: