数据帧在Pandas中合并

时间:2017-08-10 11:24:23

标签: python pandas

我有两个数据帧。首先(df1)包含姓名,ID和PIN。第二个包含标识符,城市和国家。数据框如下所示。

df1 = pd.DataFrame({"Name": ["Sam", "Ajay", "Lee", "Lee Yong Dae", "Cai Yun"], "ID": ["S01", "A01", "L02", "L03", "C01"], "PIN": ["SM392", "AA09", "Lee101", "Lee201", "C101"]})


df2 = pd.DataFrame({"Identifier": ["Sam", "L02", "C101"], "City": ["Moscow", "Seoul", "Beijing"], "Country": ["Russia", "Korea", "China"]})

如果名称,ID或PIN与df2的标识符匹配,我想合并数据帧。预期输出为: City Country Name PIN Student ID 0 Moscow Russia Sam SM392 S01 1 0 0 Ajay AA09 A01 2 Seoul Korea Lee Lee101 L02 3 0 0 Lee Yong Dae Lee201 L03 4 Beijing China Cai Yun C101 C01

2 个答案:

答案 0 :(得分:1)

这可能不是最优雅的解决方案,但它适用于我。 您必须创建3个单独的合并并组合结果。

下面的代码给出了预期的输出(对于DataFrame的不匹配元素,使用nan值而不是0)

import numpy as np
import pandas as pd

#Initial data
df1 = pd.DataFrame({"Name": ["Sam", "Ajay", "Lee", "Lee Yong Dae", "Cai Yun"], "ID": ["S01", "A01", "L02", "L03", "C01"], "PIN": ["SM392", "AA09", "Lee101", "Lee201","C101"]})

df2 = pd.DataFrame({"Identifier": ["Sam", "L02", "C101"], "City": ["Moscow", "Seoul", "Beijing"], "Country": ["Russia", "Korea", "China"]})

def merge_three(df1,df2):

    #Perform three seperate merges
    df3=df1.merge(df2, how='outer', left_on='ID', right_on='Identifier')
    df4=df1.merge(df2, how='outer', left_on='Name', right_on='Identifier')
    df5=df1.merge(df2, how='outer', left_on='PIN', right_on='Identifier')

    #Copy 2nd and 3rd merge results to df3
    df3['City_x']=df4['City']
    df3['Country_x']=df4['Country']

    df3['City_y']=df5['City']
    df3['Country_y']=df5['Country']

    #Merge the correct City and Country values. Use max to remove the NaN values
    df6=df3[['City','Country','Name','PIN','ID']]
    df6['City']=np.max([df3['City'],df3['City_x'],df3['City_y']],axis=0)
    df6['Country']=np.max([df3['Country'],df3['Country_x'],df3['Country_y']],axis=0)

    #Remove extra un-matched rows from merge
    df_final=df6[df6['Name'].notnull()]

    return df_final

df_out = merge_three(df1,df2)

输出:

df_out
      City Country          Name     PIN   ID
0   Moscow  Russia           Sam   SM392  S01
1      NaN     NaN          Ajay    AA09  A01
2    Seoul   Korea           Lee  Lee101  L02
3      NaN     NaN  Lee Yong Dae  Lee201  L03
4  Beijing   China       Cai Yun    C101  C01

答案 1 :(得分:0)

不确定,但也许这就是你要找的东西:

a = df1.merge(df2, left_on='ID', right_on='Identifier')
b = df1.merge(df2, left_on='Name', right_on='Identifier')
с = df1.merge(df2, left_on='PIN', right_on='Identifier')
df = a.append(b).append(с)
df
    ID  Name    PIN City    Country Identifier
0   L02 Lee Lee101  Seoul   Korea   L02
0   S01 Sam SM392   Moscow  Russia  Sam
0   C01 Cai Yun C101    Beijing China   C101