将时间序列数据集与缺失值对齐以进行绘图

时间:2019-12-22 06:51:37

标签: python pandas numpy matplotlib

我有三个缺少值的数据集,每个值由一个时间列和一个数据列组成。两行之间的最小时间差为1秒(00:00:01):

Dataset 1:          Dataset 2:          Dataset 3:  
00:00:00    81                          00:00:00    70
00:00:01    81                      
00:00:02    81                      
00:00:03    81                          00:00:03    99
00:00:04    81                          00:00:04    100
00:00:05    80      00:00:05    80      00:00:05    101
00:00:06    80      00:00:06    100         
                    00:00:07    92      00:00:07    88
00:00:08    83      00:00:08    80      00:00:08    88
00:00:09    84      00:00:09    83      00:00:09    87
00:00:10    86                      
00:00:11    89                      
00:00:12    90                      
00:00:13    92                          00:00:13    92
00:00:14    94                          00:00:14    94
00:00:15    94      00:00:15    96      00:00:15    93
00:00:16    96      00:00:16    97          
00:00:17    98      00:00:17    100     00:00:17    99
00:00:18    100                         00:00:18    99
00:00:19    101                         00:00:19    101
00:00:20    103                     

为直观起见,上表显示了缺少值的空白字段。实际数据密集,例如看起来像这样:

Dataset 1:          Dataset 2:          Dataset 3:  
00:00:00    81      00:00:05    80      00:00:00    70
00:00:01    81      00:00:06    100     00:00:03    99
00:00:02    81      00:00:07    92      00:00:04    100
00:00:03    81      00:00:08    80      00:00:05    101
00:00:04    81      00:00:09    83      00:00:07    88
00:00:05    80      00:00:15    96      00:00:08    88
00:00:06    80      00:00:16    97      00:00:09    87
00:00:08    83      00:00:17    100     00:00:13    92
00:00:09    84                          00:00:14    94
00:00:10    86                          00:00:15    93
00:00:11    89                          00:00:17    99
00:00:12    90                          00:00:18    99
00:00:13    92                          00:00:19    101
00:00:14    94                      
00:00:15    94                      
00:00:16    96                      
00:00:17    98                      
00:00:18    100                     
00:00:19    101                     
00:00:20    103                     

现在,我想对齐数据,以便可以这样绘制:

Combined

以这种方式:

Split

我的天真做法是这样的:

  1. 在每个数据集中查找最小/最大时间。
  2. 创建一个表,该表每次具有一行,三列各具有n/a作为值。
  3. 遍历每个数据集并将值分配给表。

是否有一些Python函数/库以有效的方式执行这些步骤?还是有更好的方法来做到这一点?

此致

1 个答案:

答案 0 :(得分:3)

您可以time列将concat的所有DataFrames与索引一起添加:

dfs = [df1, df2, df3]
df = pd.concat([x.set_index('time')['val'] for x in dfs], 
                axis=1, 
                keys=['a','b','c'],
                sort=True)
print (df)
              a      b      c
00:00:00   81.0    NaN   70.0
00:00:01   81.0    NaN    NaN
00:00:02   81.0    NaN    NaN
00:00:03   81.0    NaN   99.0
00:00:04   81.0    NaN  100.0
00:00:05   80.0   80.0  101.0
00:00:06   80.0  100.0    NaN
00:00:07    NaN   92.0   88.0
00:00:08   83.0   80.0   88.0
00:00:09   84.0   83.0   87.0
00:00:10   86.0    NaN    NaN
00:00:11   89.0    NaN    NaN
00:00:12   90.0    NaN    NaN
00:00:13   92.0    NaN   92.0
00:00:14   94.0    NaN   94.0
00:00:15   94.0   96.0   93.0
00:00:16   96.0   97.0    NaN
00:00:17   98.0  100.0   99.0
00:00:18  100.0    NaN   99.0
00:00:19  101.0    NaN  101.0
00:00:20  103.0    NaN    NaN

如果每个DataFrame中有时缺少,请添加DataFrame.asfreq,但有必要DatetimeIndex

df.index = pd.to_datetime(df.index)
df = df.asfreq('S')
df.index = df.index.time
print (df)
              a      b      c
00:00:00   81.0    NaN   70.0
00:00:01   81.0    NaN    NaN
00:00:02   81.0    NaN    NaN
00:00:03   81.0    NaN   99.0
00:00:04   81.0    NaN  100.0
00:00:05   80.0   80.0  101.0
00:00:06   80.0  100.0    NaN
00:00:07    NaN   92.0   88.0
00:00:08   83.0   80.0   88.0
00:00:09   84.0   83.0   87.0
00:00:10   86.0    NaN    NaN
00:00:11   89.0    NaN    NaN
00:00:12   90.0    NaN    NaN
00:00:13   92.0    NaN   92.0
00:00:14   94.0    NaN   94.0
00:00:15   94.0   96.0   93.0
00:00:16   96.0   97.0    NaN
00:00:17   98.0  100.0   99.0
00:00:18  100.0    NaN   99.0
00:00:19  101.0    NaN  101.0
00:00:20  103.0    NaN    NaN

最后使用DataFrame.plot进行绘图:

df.plot()

对于每个情节分别:

df.plot(subplots=True)