我有两个数据框df1:
name mark
0 Alex [Tue, 0.0, 10, 0.0, 0.0]
1 John [Tue, 0.0, 10, 0.0, 0.0]
2 Tom [Tue, 0.0, 10, 0.0, 0.0]
3 Tim [Tue, 0.0, 10, 0.0, 0.0]
和df2:
name mark1
0 Alex [11.0, 0.0, 1.0]
1 John [12.0, 0.0, 4.0]
2 Tom [12.0, 0.0, 4.0]
当我这样做时:
merged = pd.merge(df1,df2,how='outer',on='name').fillna(0)
我希望得到这样的东西:
name mark mark1
0 Alex [Tue, 0.0, 10, 0.0, 0.0] [11.0, 0.0, 1.0]
1 John [Tue, 0.0, 10, 0.0, 0.0] [12.0, 0.0, 4.0]
2 Tom [Tue, 0.0, 10, 0.0, 0.0] [12.0, 0.0, 4.0]
3 Tim [Tue, 0.0, 10, 0.0, 0.0] 0
但是我得到了这样的东西(看起来更像concat):
name mark mark1
0 Alex [Tue, 0.0, 10, 0.0, 0.0] 0
1 John [Tue, 0.0, 10, 0.0, 0.0] 0
2 Tom [Tue, 0.0, 10, 0.0, 0.0] 0
3 Tim [Tue, 0.0, 10, 0.0, 0.0] 0
4 Alex 0 [11.0, 0.0, 1.0]
5 John 0 [12.0, 0.0, 4.0]
6 Tom 0 [12.0, 0.0, 4.0]
有人可以告诉我我在做什么错吗? 这就是我的全部代码:
name mark
0 Alex [Mon, 10.12, 12, 10.0, 17.0]
1 Alex [Wed, 10.12, 15, 10.0, 17.0]
2 Alex [Fri, 10.12, 7, 10.0, 17.0]
3 Alex [Tue, 0.0, 10, 0.0, 0.0]
4 Alex [Thu, 0.0, 16, 0.0, 0.0]
5 Alex [Sat, 0.0, 2, 0.0, 0.0]
6 Alex [Sun, 0.0, 12, 0.0, 0.0]
7 John [Fri, 10.12, 7, 10.0, 17.0]
8 John [Mon, 10.12, 12, 10.0, 17.0]
9 John [Tue, 0.0, 10, 0.0, 0.0]
10 John [Wed, 0.0, 15, 0.0, 0.0]
11 John [Thu, 0.0, 16, 0.0, 0.0]
12 John [Sat, 0.0, 2, 0.0, 0.0]
13 John [Sun, 0.0, 12, 0.0, 0.0]
14 Tom [Wed, 10.12, 15, 10.0, 17.0]
15 Tom [Mon, 10.12, 12, 10.0, 17.0]
16 Tom [Fri, 10.12, 7, 10.0, 17.0]
17 Tom [Tue, 0.0, 10, 0.0, 0.0]
18 Tom [Thu, 0.0, 16, 0.0, 0.0]
19 Tom [Sat, 0.0, 2, 0.0, 0.0]
20 Tom [Sun, 0.0, 12, 0.0, 0.0]
21 Tim [Mon, 10.12, 12, 10.0, 17.0]
22 Tim [Fri, 10.12, 7, 10.0, 17.0]
23 Tim [Tue, 0.0, 10, 0.0, 0.0]
24 Tim [Wed, 0.0, 15, 0.0, 0.0]
25 Tim [Thu, 0.0, 16, 0.0, 0.0]
26 Tim [Sat, 0.0, 2, 0.0, 0.0]
27 Tim [Sun, 0.0, 12, 0.0, 0.0]
然后我做:
df = (df.groupby(['name'])['mark'].apply(list).apply(lambda x: [list(y) for y
in set([tuple(j) for j in x])]).reset_index())
这给了我:
name mark
0 Alex [[Tue, 0.0, 10, 0.0, 0.0], [Sun, 0.0, 12, 0.0,...
1 John [[Tue, 0.0, 10, 0.0, 0.0], [Sun, 0.0, 12, 0.0,...
2 Tom [[Tue, 0.0, 10, 0.0, 0.0], [Sun, 0.0, 12, 0.0,...
3 Tim [[Tue, 0.0, 10, 0.0, 0.0], [Sun, 0.0, 12, 0.0,...
第二个数据帧也可以通过这样的方式获得。(很抱歉,由于没有放置确切的数据帧,因为有点杂乱)
答案 0 :(得分:1)
在“名称”列上向左合并
df1.merge(df2, how='left',on='name')
name mark mark1
0 Alex [Tue, 0.0, 10, 0.0, 0.0] [11.0, 0.0, 1.0]
1 John [Tue, 0.0, 10, 0.0, 0.0] [12.0, 0.0, 4.0]
2 Tom [Tue, 0.0, 10, 0.0, 0.0] [12.0, 0.0, 4.0]
3 Tim [Tue, 0.0, 10, 0.0, 0.0] NaN
答案 1 :(得分:1)
以下方法应该起作用:
merged = df1.merge(df2, how='left',on='name').fillna(0)
这是因为在合并中,您正在执行完全外部联接。