我有两个数据框,如:
d = {'CD': ['LO6000', 'TBLITIE', 'UUUU'], 'REGN': ['Colorado', 'Colorado', 'Colorado'], 'rev_1': [1179.49, 2110.00, 23.54]}
df = pd.DataFrame(data=d)
CD REGN rev_1
0 LO6000 Colorado 1179.49
1 TBLITIE Colorado 2110.00
2 UUUU Colorado 23.54
和
d = {'CD': ['LO6000', 'TBLITIE'], 'REGN': ['Colorado', 'Colorado'], 'rev_2': [356, 9503]}
df = pd.DataFrame(data=d)
CD REGN rev_2
0 LO6000 Colorado 356
1 TBLITIE Colorado 9503
并希望在CD
和REGN
列上进行匹配,以得到如下所示的数据框:
d = {'CD': ['LO6000', 'TBLITIE', 'UUUU'], 'REGN': ['Colorado', 'Colorado', 'Colorado'], 'rev_1': [1179.49, 2110.00, 23.54], 'rev_2': [356.00, 9503.00, 'nan']}
df = pd.DataFrame(data=d)
CD REGN rev_1 rev_2
0 LO6000 Colorado 1179.49 356.00
1 TBLITIE Colorado 2110.00 9503.00
2 UUUU Colorado 23.54 nan
答案 0 :(得分:1)
如果
d1 = {'CD': ['LO6000', 'TBLITIE', 'UUUU'], 'REGN': ['Colorado', 'Colorado', 'Colorado'], 'rev_1': [1179.49, 2110.00, 23.54]}
df1 = pd.DataFrame(data=d1)
d2 = {'CD': ['LO6000', 'TBLITIE'], 'REGN': ['Colorado', 'Colorado'], 'rev_2': [356, 9503]}
df2 = pd.DataFrame(data=d2)
然后
df = pd.merge(left=df1, right=df2, how="left", on=["CD", "REGN"])
输出:
CD REGN rev_1 rev_2
0 LO6000 Colorado 1179.49 356.0
1 TBLITIE Colorado 2110.00 9503.0
2 UUUU Colorado 23.54 NaN