如何在不丢失行的情况下合并时间序列熊猫数据框?

时间:2019-01-08 16:04:34

标签: python pandas datetime dataframe time-series

标题

  1. 如何在不丢失行的情况下合并时间序列DataFrame?
  2. 最终结果DataFrame形状应基于哪个DataFrame具有更大的DataFrame形状。

DF1:

0  17.12.2014 13:56:56                        1.9
1  17.12.2014 13:56:58                        3.1
2  17.12.2014 13:56:59                        2.8
3  17.12.2014 13:57:10                        2.3
4  17.12.2014 13:57:11                        3.1

df1.shape大约为3000

df2:
       Time                                    Value
1   17.12.2014 13:55:56                        2.9
2   17.12.2014 13:55:58                        6.0
3   17.12.2014 13:55:58                        3.6
4   17.12.2014 13:55:59                        2.8
5   17.12.2014 13:56:07                        1.9
6   17.12.2014 13:56:12                        2.9
7   17.12.2014 13:56:12                        3.0
8   17.12.2014 13:56:13                        1.8
9   17.12.2014 13:56:15                        2.2
10  17.12.2014 13:56:15                        2.0
11  17.12.2014 13:56:41                        1.7
12  17.12.2014 13:56:41                        2.4
13  17.12.2014 13:56:42                        2.8
14  17.12.2014 13:56:42                        1.9
15  17.12.2014 13:56:43                        2.8
16  17.12.2014 13:56:43                        1.7
17  17.12.2014 13:56:44                        2.8
18  17.12.2014 13:56:45                        1.7
19  17.12.2014 13:56:59                        2.8
20  17.12.2014 14:03:08                        1.7

df2.shape大约是20000

df3 

1   17.12.2014 13:56:12                        3.2

df3.shape约为5000

我需要如下所示的结果DataFrame,并且根据DF2大小,结果dataFrame大小应为(20000):

    Time                   Value1          Value2                       Value3                
1   17.12.2014 13:55:56        NaN             2.9                            NaN                    
2   17.12.2014 13:55:58        NaN             6.0                            NaN                    
3   17.12.2014 13:55:58        NaN             3.6                            NaN                    
4   17.12.2014 13:55:59        NaN             2.8                            NaN                    
5   17.12.2014 13:56:07        NaN             1.9                            NaN                    
6   17.12.2014 13:56:12        NaN             2.9                            NaN                    
7   17.12.2014 13:56:12        NaN             3.0                            3.2                    
8   17.12.2014 13:56:13        NaN             1.8                            NaN                    
9   17.12.2014 13:56:15        NaN             2.2                            NaN                    
10  17.12.2014 13:56:15        NaN             2.0                            NaN                    
11  17.12.2014 13:56:41        NaN             1.7                            NaN                    
12  17.12.2014 13:56:41        NaN             2.4                            NaN                    
13  17.12.2014 13:56:42        NaN             2.8                            NaN                    
14  17.12.2014 13:56:42        NaN             1.9                            NaN                    
15  17.12.2014 13:56:43        NaN             2.8                            NaN                    
16  17.12.2014 13:56:43        NaN             1.7                            NaN                    
17  17.12.2014 13:56:44        NaN             2.8                            NaN                    
18  17.12.2014 13:56:45        NaN             1.7                            NaN       
19  17.12.2014 13:56:56        1.9             NaN                            NaN
20  17.12.2014 13:56:58        3.1             NaN                            NaN
21  17.12.2014 13:56:59        2.8             2.8                            NaN
22  17.12.2014 13:57:10        2.3             NaN                            NaN
23  17.12.2014 13:57:11        3.1             NaN                            NaN
20  17.12.2014 14:03:08        NaN             1.7                            NaN

谢谢

3 个答案:

答案 0 :(得分:1)

我认为您想要的是outer join

outer

这会进行完全外部联接。您可以将{/ {1}}更改为left,将{/ 1}更改为左/右外部联接。

答案 1 :(得分:1)

将索引设置为时间,然后使用outer加入。您可以使用reduce中的functools来简化语法。

from functools import reduce

reduce(lambda l,r: l.join(r, how='outer'), [df.set_index('Time') for df in [df1, df2, df3]])

输出:

                     Val1  Val2  Val3
Time                                 
17.12.2014 13:55:56   NaN   2.9   NaN
17.12.2014 13:55:58   NaN   6.0   NaN
17.12.2014 13:55:58   NaN   3.6   NaN
17.12.2014 13:55:59   NaN   2.8   NaN
17.12.2014 13:56:07   NaN   1.9   NaN
17.12.2014 13:56:12   NaN   2.9   3.2
17.12.2014 13:56:12   NaN   3.0   3.2
17.12.2014 13:56:13   NaN   1.8   NaN
17.12.2014 13:56:15   NaN   2.2   NaN
17.12.2014 13:56:15   NaN   2.0   NaN
17.12.2014 13:56:41   NaN   1.7   NaN
17.12.2014 13:56:41   NaN   2.4   NaN
17.12.2014 13:56:42   NaN   2.8   NaN
17.12.2014 13:56:42   NaN   1.9   NaN
17.12.2014 13:56:43   NaN   2.8   NaN
17.12.2014 13:56:43   NaN   1.7   NaN
17.12.2014 13:56:44   NaN   2.8   NaN
17.12.2014 13:56:45   NaN   1.7   NaN
17.12.2014 13:56:56   1.9   NaN   NaN
17.12.2014 13:56:58   3.1   NaN   NaN
17.12.2014 13:56:59   2.8   2.8   NaN
17.12.2014 13:57:10   2.3   NaN   NaN
17.12.2014 13:57:11   3.1   NaN   NaN
17.12.2014 14:03:08   NaN   1.7   NaN

请注意,在您提供的输入中,您在17.12.2014 13:56:12中有df2的两个条目,因此df3中的值被带到了这两行。

答案 2 :(得分:1)

正是针对这些情况构建了join方法。您可以将任意数量的DataFrame与其一起加入。调用DataFrame与传递的DataFrames集合的索引连接。要使用多个DataFrame,必须将联接列放在索引中。

dfs = [df1, df2, df3]
dfs = [df.set_index('Time') for df in dfs]
dfs[0].join(dfs[1:])

在学习在线课程时,是从@Ted Petrou那里学习的。

使用合并:

df1.merge(df2,on='Time', how='outer').merge(df3,on='Time')

OR

pd.merge(pd.merge(df1,df2,on='Time', how='outer'),df3,on='Time')