根据条件合并3个不同的数据帧

时间:2017-03-09 17:17:18

标签: python python-3.x pandas

如何组合三个数据帧,如下所示?

前两个的主要关系必须基于ID1,因为它是两个数据帧之间的匹配关系。

第三个数据帧,Address2必须匹配才能添加哈希值

DF1:

Name1   Name2  Name3   Address    ID1     ID2    Own
Matt    John1  Jill     878 home   1       0     Deal
Matt    John2  Jack     879 home   2       1     Dael

DF2:

Name1   ID1   Address   Name4     Address2
Matt    1     878 home  face1     face\123
Matt    1     878 home  face2     face\345
Matt    1     878 home  face3     face\678    
Matt    2     879 home  head1     head\123
Matt    2     879 home  head2     head\345
Matt    2     879 home  head3     head\678

DF3:

Address2     Hash
face\123     abc123
face\345     cde321
face\678     efg123
head\123     123efg
head\345     efg321
head\678     acd321

我正在尝试将三个数据框合并为如下所示:

Name1   Name2   ID1 Address     Own    Name3    ID2 Name4   Address2    Hash
Matt    John1   1   878 home    Deal    Jill    0   face1   face\123    abc123
Matt    John1   1   878 home    Deal    Jill    0   face2   face\345    cde321
Matt    John1   1   878 home    Deal    Jill    0   face3   face\678    efg123
Matt    John2   2   879 home    Dael    Jack    1   head1   head\123    123efg
Matt    John2   2   879 home    Dael    Jack    1   head2   head\345    efg321
Matt    John2   2   879 home    Dael    Jack    1   head3   head\678    acd321

在df1和df2之间,键是Id1 在df2和df3之间,键是Address2

非常感谢你的帮助。

3 个答案:

答案 0 :(得分:1)

查看merge函数,可以找到一些示例here。针对您的具体问题,请尝试以下方法:

combined_df = df1.merge(df2, on="Id1", how="inner").merge(df3, on="Adress2", how="inner")

答案 1 :(得分:0)

我认为这会奏效。在您想要加入的列上,TH合并功能非常适合您。

import numpy as np
import pandas as pd

data = np.array([['Name1','Name2','Name3','Address','ID1','ID2','Own'],
                 ['Matt','John1','Jill','878 home','1','0','Deal'],
                 ['Matt', 'John2', 'Jack', '879 home', '2', '1', 'Dael']])

data2 = np.array([['Name1','ID1','Address','Name4','Address2'],
                 ['Matt', '1','878 home','face1',"face.123"],
                 ['Matt', '1','878 home', 'face2','face.345'],
                  ['Matt', '1','878 home', 'face3', 'face.678'],
                  ['Matt', '2', '879 home', 'head1', 'head.123'],
                  ['Matt', '2', '879 home', 'head2',  'head.345'],
                  ['Matt', '2', '879 home', 'head3', 'head.678']])
#print(data)
data3 = np.array([['Address2','Hash'],
                 ['face.123', 'abc123'],
                ['face.345','cde321'],
                 ['face.678', 'efg123'],
                ['head.123', '123efg'],
                ['head.345', 'efg321'],
                ['head.678', 'acd321']])

df1 = pd.DataFrame(data=data[1:,:], columns=data[0,:])
df2 = pd.DataFrame(data=data2[1:,:], columns=data2[0,:])
df3 = pd.DataFrame(data=data3[1:,:], columns=data3[0,:])


Cdf= pd.merge(df1,df2, on='ID1', how='inner')
Ddf = pd.merge(Cdf,df3, on = 'Address2', how='inner')
print(Ddf)

答案 2 :(得分:0)

从您想要的输出中,您似乎不需要在默认情况下完成的列交叉合并之外的任何规范。

subdomain.domain.eu/en/

指定要合并的单个列作为接受的答案确实会导致问题,因为您将有后缀列。

>>> df1.merge(df2).merge(df3)

  Name1  Name2 Name3  Address  ID1  ID2   Own  Name4  Address2    Hash
0  Matt  John1  Jill  878 home    1    0  Deal  face1  face\123  abc123
1  Matt  John1  Jill  878 home    1    0  Deal  face2  face\345  cde321
2  Matt  John1  Jill  878 home    1    0  Deal  face3  face\678  efg123
3  Matt  John2  Jack  879 home    2    1  Dael  head1  head\123  123efg
4  Matt  John2  Jack  879 home    2    1  Dael  head2  head\345  efg321
5  Matt  John2  Jack  879 home    2    1  Dael  head3  head\678  acd321