如何组合三个数据帧,如下所示?
前两个的主要关系必须基于ID1,因为它是两个数据帧之间的匹配关系。
第三个数据帧,Address2必须匹配才能添加哈希值
DF1:
Name1 Name2 Name3 Address ID1 ID2 Own
Matt John1 Jill 878 home 1 0 Deal
Matt John2 Jack 879 home 2 1 Dael
DF2:
Name1 ID1 Address Name4 Address2
Matt 1 878 home face1 face\123
Matt 1 878 home face2 face\345
Matt 1 878 home face3 face\678
Matt 2 879 home head1 head\123
Matt 2 879 home head2 head\345
Matt 2 879 home head3 head\678
DF3:
Address2 Hash
face\123 abc123
face\345 cde321
face\678 efg123
head\123 123efg
head\345 efg321
head\678 acd321
我正在尝试将三个数据框合并为如下所示:
Name1 Name2 ID1 Address Own Name3 ID2 Name4 Address2 Hash
Matt John1 1 878 home Deal Jill 0 face1 face\123 abc123
Matt John1 1 878 home Deal Jill 0 face2 face\345 cde321
Matt John1 1 878 home Deal Jill 0 face3 face\678 efg123
Matt John2 2 879 home Dael Jack 1 head1 head\123 123efg
Matt John2 2 879 home Dael Jack 1 head2 head\345 efg321
Matt John2 2 879 home Dael Jack 1 head3 head\678 acd321
在df1和df2之间,键是Id1 在df2和df3之间,键是Address2
非常感谢你的帮助。
答案 0 :(得分:1)
查看merge函数,可以找到一些示例here。针对您的具体问题,请尝试以下方法:
combined_df = df1.merge(df2, on="Id1", how="inner").merge(df3, on="Adress2", how="inner")
答案 1 :(得分:0)
我认为这会奏效。在您想要加入的列上,TH合并功能非常适合您。
import numpy as np
import pandas as pd
data = np.array([['Name1','Name2','Name3','Address','ID1','ID2','Own'],
['Matt','John1','Jill','878 home','1','0','Deal'],
['Matt', 'John2', 'Jack', '879 home', '2', '1', 'Dael']])
data2 = np.array([['Name1','ID1','Address','Name4','Address2'],
['Matt', '1','878 home','face1',"face.123"],
['Matt', '1','878 home', 'face2','face.345'],
['Matt', '1','878 home', 'face3', 'face.678'],
['Matt', '2', '879 home', 'head1', 'head.123'],
['Matt', '2', '879 home', 'head2', 'head.345'],
['Matt', '2', '879 home', 'head3', 'head.678']])
#print(data)
data3 = np.array([['Address2','Hash'],
['face.123', 'abc123'],
['face.345','cde321'],
['face.678', 'efg123'],
['head.123', '123efg'],
['head.345', 'efg321'],
['head.678', 'acd321']])
df1 = pd.DataFrame(data=data[1:,:], columns=data[0,:])
df2 = pd.DataFrame(data=data2[1:,:], columns=data2[0,:])
df3 = pd.DataFrame(data=data3[1:,:], columns=data3[0,:])
Cdf= pd.merge(df1,df2, on='ID1', how='inner')
Ddf = pd.merge(Cdf,df3, on = 'Address2', how='inner')
print(Ddf)
答案 2 :(得分:0)
从您想要的输出中,您似乎不需要在默认情况下完成的列交叉合并之外的任何规范。
subdomain.domain.eu/en/
指定要合并的单个列作为接受的答案确实会导致问题,因为您将有后缀列。
>>> df1.merge(df2).merge(df3)
Name1 Name2 Name3 Address ID1 ID2 Own Name4 Address2 Hash
0 Matt John1 Jill 878 home 1 0 Deal face1 face\123 abc123
1 Matt John1 Jill 878 home 1 0 Deal face2 face\345 cde321
2 Matt John1 Jill 878 home 1 0 Deal face3 face\678 efg123
3 Matt John2 Jack 879 home 2 1 Dael head1 head\123 123efg
4 Matt John2 Jack 879 home 2 1 Dael head2 head\345 efg321
5 Matt John2 Jack 879 home 2 1 Dael head3 head\678 acd321