DF1:
0 17.12.2014 13:56:56 1.9
1 17.12.2014 13:56:58 3.1
2 17.12.2014 13:56:59 2.8
3 17.12.2014 13:57:10 2.3
4 17.12.2014 13:57:11 3.1
df1.shape大约为3000
df2:
Time Value
1 17.12.2014 13:55:56 2.9
2 17.12.2014 13:55:58 6.0
3 17.12.2014 13:55:58 3.6
4 17.12.2014 13:55:59 2.8
5 17.12.2014 13:56:07 1.9
6 17.12.2014 13:56:12 2.9
7 17.12.2014 13:56:12 3.0
8 17.12.2014 13:56:13 1.8
9 17.12.2014 13:56:15 2.2
10 17.12.2014 13:56:15 2.0
11 17.12.2014 13:56:41 1.7
12 17.12.2014 13:56:41 2.4
13 17.12.2014 13:56:42 2.8
14 17.12.2014 13:56:42 1.9
15 17.12.2014 13:56:43 2.8
16 17.12.2014 13:56:43 1.7
17 17.12.2014 13:56:44 2.8
18 17.12.2014 13:56:45 1.7
19 17.12.2014 13:56:59 2.8
20 17.12.2014 14:03:08 1.7
df2.shape大约是20000
df3
1 17.12.2014 13:56:12 3.2
df3.shape约为5000
我需要如下所示的结果DataFrame,并且根据DF2大小,结果dataFrame大小应为(20000):
Time Value1 Value2 Value3
1 17.12.2014 13:55:56 NaN 2.9 NaN
2 17.12.2014 13:55:58 NaN 6.0 NaN
3 17.12.2014 13:55:58 NaN 3.6 NaN
4 17.12.2014 13:55:59 NaN 2.8 NaN
5 17.12.2014 13:56:07 NaN 1.9 NaN
6 17.12.2014 13:56:12 NaN 2.9 NaN
7 17.12.2014 13:56:12 NaN 3.0 3.2
8 17.12.2014 13:56:13 NaN 1.8 NaN
9 17.12.2014 13:56:15 NaN 2.2 NaN
10 17.12.2014 13:56:15 NaN 2.0 NaN
11 17.12.2014 13:56:41 NaN 1.7 NaN
12 17.12.2014 13:56:41 NaN 2.4 NaN
13 17.12.2014 13:56:42 NaN 2.8 NaN
14 17.12.2014 13:56:42 NaN 1.9 NaN
15 17.12.2014 13:56:43 NaN 2.8 NaN
16 17.12.2014 13:56:43 NaN 1.7 NaN
17 17.12.2014 13:56:44 NaN 2.8 NaN
18 17.12.2014 13:56:45 NaN 1.7 NaN
19 17.12.2014 13:56:56 1.9 NaN NaN
20 17.12.2014 13:56:58 3.1 NaN NaN
21 17.12.2014 13:56:59 2.8 2.8 NaN
22 17.12.2014 13:57:10 2.3 NaN NaN
23 17.12.2014 13:57:11 3.1 NaN NaN
20 17.12.2014 14:03:08 NaN 1.7 NaN
谢谢
答案 0 :(得分:1)
答案 1 :(得分:1)
将索引设置为时间,然后使用outer
加入。您可以使用reduce
中的functools
来简化语法。
from functools import reduce
reduce(lambda l,r: l.join(r, how='outer'), [df.set_index('Time') for df in [df1, df2, df3]])
Val1 Val2 Val3
Time
17.12.2014 13:55:56 NaN 2.9 NaN
17.12.2014 13:55:58 NaN 6.0 NaN
17.12.2014 13:55:58 NaN 3.6 NaN
17.12.2014 13:55:59 NaN 2.8 NaN
17.12.2014 13:56:07 NaN 1.9 NaN
17.12.2014 13:56:12 NaN 2.9 3.2
17.12.2014 13:56:12 NaN 3.0 3.2
17.12.2014 13:56:13 NaN 1.8 NaN
17.12.2014 13:56:15 NaN 2.2 NaN
17.12.2014 13:56:15 NaN 2.0 NaN
17.12.2014 13:56:41 NaN 1.7 NaN
17.12.2014 13:56:41 NaN 2.4 NaN
17.12.2014 13:56:42 NaN 2.8 NaN
17.12.2014 13:56:42 NaN 1.9 NaN
17.12.2014 13:56:43 NaN 2.8 NaN
17.12.2014 13:56:43 NaN 1.7 NaN
17.12.2014 13:56:44 NaN 2.8 NaN
17.12.2014 13:56:45 NaN 1.7 NaN
17.12.2014 13:56:56 1.9 NaN NaN
17.12.2014 13:56:58 3.1 NaN NaN
17.12.2014 13:56:59 2.8 2.8 NaN
17.12.2014 13:57:10 2.3 NaN NaN
17.12.2014 13:57:11 3.1 NaN NaN
17.12.2014 14:03:08 NaN 1.7 NaN
请注意,在您提供的输入中,您在17.12.2014 13:56:12
中有df2
的两个条目,因此df3
中的值被带到了这两行。
答案 2 :(得分:1)
正是针对这些情况构建了join方法。您可以将任意数量的DataFrame与其一起加入。调用DataFrame与传递的DataFrames集合的索引连接。要使用多个DataFrame,必须将联接列放在索引中。
dfs = [df1, df2, df3]
dfs = [df.set_index('Time') for df in dfs]
dfs[0].join(dfs[1:])
在学习在线课程时,是从@Ted Petrou那里学习的。
使用合并:
df1.merge(df2,on='Time', how='outer').merge(df3,on='Time')
OR
pd.merge(pd.merge(df1,df2,on='Time', how='outer'),df3,on='Time')