熊猫:在匹配列上连接两个数据框,保留不匹配列,并用NaN填充空白

时间:2020-08-25 22:49:29

标签: python pandas

我有两个数据框,如:

d = {'CD': ['LO6000', 'TBLITIE', 'UUUU'], 'REGN': ['Colorado', 'Colorado', 'Colorado'], 'rev_1': [1179.49, 2110.00, 23.54]}
df = pd.DataFrame(data=d)


        CD      REGN    rev_1
0   LO6000  Colorado  1179.49
1  TBLITIE  Colorado  2110.00
2     UUUU  Colorado    23.54

d = {'CD': ['LO6000', 'TBLITIE'], 'REGN': ['Colorado', 'Colorado'], 'rev_2': [356, 9503]}
df = pd.DataFrame(data=d)

        CD      REGN  rev_2
0   LO6000  Colorado    356
1  TBLITIE  Colorado   9503

并希望在CDREGN列上进行匹配,以得到如下所示的数据框:

d = {'CD': ['LO6000', 'TBLITIE', 'UUUU'], 'REGN': ['Colorado', 'Colorado', 'Colorado'], 'rev_1': [1179.49, 2110.00, 23.54], 'rev_2': [356.00, 9503.00, 'nan']}
df = pd.DataFrame(data=d)


        CD      REGN    rev_1    rev_2
0   LO6000  Colorado  1179.49   356.00
1  TBLITIE  Colorado  2110.00  9503.00
2     UUUU  Colorado    23.54   nan

1 个答案:

答案 0 :(得分:1)

如果

d1 = {'CD': ['LO6000', 'TBLITIE', 'UUUU'], 'REGN': ['Colorado', 'Colorado', 'Colorado'], 'rev_1': [1179.49, 2110.00, 23.54]}
df1 = pd.DataFrame(data=d1)


d2 = {'CD': ['LO6000', 'TBLITIE'], 'REGN': ['Colorado', 'Colorado'], 'rev_2': [356, 9503]}
df2 = pd.DataFrame(data=d2)

然后

df = pd.merge(left=df1, right=df2, how="left", on=["CD", "REGN"])

输出:

        CD      REGN    rev_1   rev_2
0   LO6000  Colorado  1179.49   356.0
1  TBLITIE  Colorado  2110.00  9503.0
2     UUUU  Colorado    23.54     NaN