Question

如何组合三个数据帧，如下所示？

前两个的主要关系必须基于ID1，因为它是两个数据帧之间的匹配关系。

第三个数据帧，Address2必须匹配才能添加哈希值

DF1：

Name1   Name2  Name3   Address    ID1     ID2    Own
Matt    John1  Jill     878 home   1       0     Deal
Matt    John2  Jack     879 home   2       1     Dael

DF2：

Name1   ID1   Address   Name4     Address2
Matt    1     878 home  face1     face\123
Matt    1     878 home  face2     face\345
Matt    1     878 home  face3     face\678    
Matt    2     879 home  head1     head\123
Matt    2     879 home  head2     head\345
Matt    2     879 home  head3     head\678

DF3：

Address2     Hash
face\123     abc123
face\345     cde321
face\678     efg123
head\123     123efg
head\345     efg321
head\678     acd321

我正在尝试将三个数据框合并为如下所示：

Name1   Name2   ID1 Address     Own    Name3    ID2 Name4   Address2    Hash
Matt    John1   1   878 home    Deal    Jill    0   face1   face\123    abc123
Matt    John1   1   878 home    Deal    Jill    0   face2   face\345    cde321
Matt    John1   1   878 home    Deal    Jill    0   face3   face\678    efg123
Matt    John2   2   879 home    Dael    Jack    1   head1   head\123    123efg
Matt    John2   2   879 home    Dael    Jack    1   head2   head\345    efg321
Matt    John2   2   879 home    Dael    Jack    1   head3   head\678    acd321

在df1和df2之间，键是Id1 在df2和df3之间，键是Address2

非常感谢你的帮助。

Answer 1

查看merge函数，可以找到一些示例here。针对您的具体问题，请尝试以下方法：

combined_df = df1.merge(df2, on="Id1", how="inner").merge(df3, on="Adress2", how="inner")

Answer 2

我认为这会奏效。在您想要加入的列上，TH合并功能非常适合您。

import numpy as np
import pandas as pd

data = np.array([['Name1','Name2','Name3','Address','ID1','ID2','Own'],
                 ['Matt','John1','Jill','878 home','1','0','Deal'],
                 ['Matt', 'John2', 'Jack', '879 home', '2', '1', 'Dael']])

data2 = np.array([['Name1','ID1','Address','Name4','Address2'],
                 ['Matt', '1','878 home','face1',"face.123"],
                 ['Matt', '1','878 home', 'face2','face.345'],
                  ['Matt', '1','878 home', 'face3', 'face.678'],
                  ['Matt', '2', '879 home', 'head1', 'head.123'],
                  ['Matt', '2', '879 home', 'head2',  'head.345'],
                  ['Matt', '2', '879 home', 'head3', 'head.678']])
#print(data)
data3 = np.array([['Address2','Hash'],
                 ['face.123', 'abc123'],
                ['face.345','cde321'],
                 ['face.678', 'efg123'],
                ['head.123', '123efg'],
                ['head.345', 'efg321'],
                ['head.678', 'acd321']])

df1 = pd.DataFrame(data=data[1:,:], columns=data[0,:])
df2 = pd.DataFrame(data=data2[1:,:], columns=data2[0,:])
df3 = pd.DataFrame(data=data3[1:,:], columns=data3[0,:])


Cdf= pd.merge(df1,df2, on='ID1', how='inner')
Ddf = pd.merge(Cdf,df3, on = 'Address2', how='inner')
print(Ddf)

Answer 3

从您想要的输出中，您似乎不需要在默认情况下完成的列交叉合并之外的任何规范。

subdomain.domain.eu/en/

指定要合并的单个列作为接受的答案确实会导致问题，因为您将有后缀列。

>>> df1.merge(df2).merge(df3)

  Name1  Name2 Name3  Address  ID1  ID2   Own  Name4  Address2    Hash
0  Matt  John1  Jill  878 home    1    0  Deal  face1  face\123  abc123
1  Matt  John1  Jill  878 home    1    0  Deal  face2  face\345  cde321
2  Matt  John1  Jill  878 home    1    0  Deal  face3  face\678  efg123
3  Matt  John2  Jack  879 home    2    1  Dael  head1  head\123  123efg
4  Matt  John2  Jack  879 home    2    1  Dael  head2  head\345  efg321
5  Matt  John2  Jack  879 home    2    1  Dael  head3  head\678  acd321

根据条件合并3个不同的数据帧

3 个答案: