熊猫在执行合并操作时执行concat而不是合并

时间:2020-08-02 20:33:26

标签: python pandas

我有两个数据框df1:

                      name              mark
     0               Alex  [Tue, 0.0, 10, 0.0, 0.0]
     1               John  [Tue, 0.0, 10, 0.0, 0.0]
     2                Tom  [Tue, 0.0, 10, 0.0, 0.0]
     3                Tim  [Tue, 0.0, 10, 0.0, 0.0]

和df2:

           name        mark1
   0       Alex  [11.0, 0.0, 1.0]
   1       John  [12.0, 0.0, 4.0]
   2        Tom  [12.0, 0.0, 4.0]

当我这样做时:

merged = pd.merge(df1,df2,how='outer',on='name').fillna(0)

我希望得到这样的东西:

                 name              mark                   mark1
     0           Alex  [Tue, 0.0, 10, 0.0, 0.0]   [11.0, 0.0, 1.0]
     1           John  [Tue, 0.0, 10, 0.0, 0.0]   [12.0, 0.0, 4.0]
     2            Tom  [Tue, 0.0, 10, 0.0, 0.0]   [12.0, 0.0, 4.0]
     3            Tim  [Tue, 0.0, 10, 0.0, 0.0]        0

但是我得到了这样的东西(看起来更像concat):

                 name              mark              mark1
     0           Alex  [Tue, 0.0, 10, 0.0, 0.0]       0
     1           John  [Tue, 0.0, 10, 0.0, 0.0]       0
     2            Tom  [Tue, 0.0, 10, 0.0, 0.0]       0
     3            Tim  [Tue, 0.0, 10, 0.0, 0.0]       0
     4           Alex          0                [11.0, 0.0, 1.0]
     5           John          0                [12.0, 0.0, 4.0]
     6            Tom           0               [12.0, 0.0, 4.0]

有人可以告诉我我在做什么错吗? 这就是我的全部代码:

                       name              mark
     0                Alex   [Mon, 10.12, 12, 10.0, 17.0]
     1                Alex   [Wed, 10.12, 15, 10.0, 17.0]
     2                Alex   [Fri, 10.12, 7, 10.0, 17.0]
     3                Alex   [Tue, 0.0, 10, 0.0, 0.0]
     4                Alex   [Thu, 0.0, 16, 0.0, 0.0]
     5                Alex   [Sat, 0.0, 2, 0.0, 0.0]
     6                Alex   [Sun, 0.0, 12, 0.0, 0.0]
     7                John   [Fri, 10.12, 7, 10.0, 17.0]
     8                John   [Mon, 10.12, 12, 10.0, 17.0]
     9                John   [Tue, 0.0, 10, 0.0, 0.0]
    10               John   [Wed, 0.0, 15, 0.0, 0.0]
    11               John   [Thu, 0.0, 16, 0.0, 0.0]
    12               John   [Sat, 0.0, 2, 0.0, 0.0]
    13               John   [Sun, 0.0, 12, 0.0, 0.0]
    14                Tom  [Wed, 10.12, 15, 10.0, 17.0]
    15                Tom  [Mon, 10.12, 12, 10.0, 17.0]
    16                Tom   [Fri, 10.12, 7, 10.0, 17.0]
    17                Tom   [Tue, 0.0, 10, 0.0, 0.0]
    18                Tom   [Thu, 0.0, 16, 0.0, 0.0]
    19                Tom   [Sat, 0.0, 2, 0.0, 0.0]
    20                Tom   [Sun, 0.0, 12, 0.0, 0.0]
    21                Tim  [Mon, 10.12, 12, 10.0, 17.0]
    22                Tim  [Fri, 10.12, 7, 10.0, 17.0]
    23                Tim   [Tue, 0.0, 10, 0.0, 0.0]
    24                Tim   [Wed, 0.0, 15, 0.0, 0.0]
    25                Tim   [Thu, 0.0, 16, 0.0, 0.0]
    26                Tim   [Sat, 0.0, 2, 0.0, 0.0]
    27                Tim   [Sun, 0.0, 12, 0.0, 0.0]

然后我做:

df = (df.groupby(['name'])['mark'].apply(list).apply(lambda x: [list(y) for y 
                 in set([tuple(j) for j in x])]).reset_index())

这给了我:

                  name                                               mark
 0               Alex  [[Tue, 0.0, 10, 0.0, 0.0], [Sun, 0.0, 12, 0.0,...
 1               John  [[Tue, 0.0, 10, 0.0, 0.0], [Sun, 0.0, 12, 0.0,...
 2                Tom  [[Tue, 0.0, 10, 0.0, 0.0], [Sun, 0.0, 12, 0.0,...
 3                Tim  [[Tue, 0.0, 10, 0.0, 0.0], [Sun, 0.0, 12, 0.0,...

第二个数据帧也可以通过这样的方式获得。(很抱歉,由于没有放置确切的数据帧,因为有点杂乱)

2 个答案:

答案 0 :(得分:1)

在“名称”列上向左合并

df1.merge(df2, how='left',on='name')

  name                     mark               mark1
0  Alex  [Tue, 0.0, 10, 0.0, 0.0]   [11.0, 0.0, 1.0]
1  John  [Tue, 0.0, 10, 0.0, 0.0]   [12.0, 0.0, 4.0]
2   Tom  [Tue, 0.0, 10, 0.0, 0.0]   [12.0, 0.0, 4.0]
3   Tim  [Tue, 0.0, 10, 0.0, 0.0]                NaN

答案 1 :(得分:1)

以下方法应该起作用:

merged = df1.merge(df2, how='left',on='name').fillna(0)

这是因为在合并中,您正在执行完全外部联接。