我有三个缺少值的数据集,每个值由一个时间列和一个数据列组成。两行之间的最小时间差为1秒(00:00:01):
Dataset 1: Dataset 2: Dataset 3:
00:00:00 81 00:00:00 70
00:00:01 81
00:00:02 81
00:00:03 81 00:00:03 99
00:00:04 81 00:00:04 100
00:00:05 80 00:00:05 80 00:00:05 101
00:00:06 80 00:00:06 100
00:00:07 92 00:00:07 88
00:00:08 83 00:00:08 80 00:00:08 88
00:00:09 84 00:00:09 83 00:00:09 87
00:00:10 86
00:00:11 89
00:00:12 90
00:00:13 92 00:00:13 92
00:00:14 94 00:00:14 94
00:00:15 94 00:00:15 96 00:00:15 93
00:00:16 96 00:00:16 97
00:00:17 98 00:00:17 100 00:00:17 99
00:00:18 100 00:00:18 99
00:00:19 101 00:00:19 101
00:00:20 103
为直观起见,上表显示了缺少值的空白字段。实际数据密集,例如看起来像这样:
Dataset 1: Dataset 2: Dataset 3:
00:00:00 81 00:00:05 80 00:00:00 70
00:00:01 81 00:00:06 100 00:00:03 99
00:00:02 81 00:00:07 92 00:00:04 100
00:00:03 81 00:00:08 80 00:00:05 101
00:00:04 81 00:00:09 83 00:00:07 88
00:00:05 80 00:00:15 96 00:00:08 88
00:00:06 80 00:00:16 97 00:00:09 87
00:00:08 83 00:00:17 100 00:00:13 92
00:00:09 84 00:00:14 94
00:00:10 86 00:00:15 93
00:00:11 89 00:00:17 99
00:00:12 90 00:00:18 99
00:00:13 92 00:00:19 101
00:00:14 94
00:00:15 94
00:00:16 96
00:00:17 98
00:00:18 100
00:00:19 101
00:00:20 103
现在,我想对齐数据,以便可以这样绘制:
以这种方式:
我的天真做法是这样的:
n/a
作为值。是否有一些Python函数/库以有效的方式执行这些步骤?还是有更好的方法来做到这一点?
此致
答案 0 :(得分:3)
您可以time
列将concat
的所有DataFrames与索引一起添加:
dfs = [df1, df2, df3]
df = pd.concat([x.set_index('time')['val'] for x in dfs],
axis=1,
keys=['a','b','c'],
sort=True)
print (df)
a b c
00:00:00 81.0 NaN 70.0
00:00:01 81.0 NaN NaN
00:00:02 81.0 NaN NaN
00:00:03 81.0 NaN 99.0
00:00:04 81.0 NaN 100.0
00:00:05 80.0 80.0 101.0
00:00:06 80.0 100.0 NaN
00:00:07 NaN 92.0 88.0
00:00:08 83.0 80.0 88.0
00:00:09 84.0 83.0 87.0
00:00:10 86.0 NaN NaN
00:00:11 89.0 NaN NaN
00:00:12 90.0 NaN NaN
00:00:13 92.0 NaN 92.0
00:00:14 94.0 NaN 94.0
00:00:15 94.0 96.0 93.0
00:00:16 96.0 97.0 NaN
00:00:17 98.0 100.0 99.0
00:00:18 100.0 NaN 99.0
00:00:19 101.0 NaN 101.0
00:00:20 103.0 NaN NaN
如果每个DataFrame中有时缺少,请添加DataFrame.asfreq
,但有必要DatetimeIndex
:
df.index = pd.to_datetime(df.index)
df = df.asfreq('S')
df.index = df.index.time
print (df)
a b c
00:00:00 81.0 NaN 70.0
00:00:01 81.0 NaN NaN
00:00:02 81.0 NaN NaN
00:00:03 81.0 NaN 99.0
00:00:04 81.0 NaN 100.0
00:00:05 80.0 80.0 101.0
00:00:06 80.0 100.0 NaN
00:00:07 NaN 92.0 88.0
00:00:08 83.0 80.0 88.0
00:00:09 84.0 83.0 87.0
00:00:10 86.0 NaN NaN
00:00:11 89.0 NaN NaN
00:00:12 90.0 NaN NaN
00:00:13 92.0 NaN 92.0
00:00:14 94.0 NaN 94.0
00:00:15 94.0 96.0 93.0
00:00:16 96.0 97.0 NaN
00:00:17 98.0 100.0 99.0
00:00:18 100.0 NaN 99.0
00:00:19 101.0 NaN 101.0
00:00:20 103.0 NaN NaN
最后使用DataFrame.plot
进行绘图:
df.plot()
对于每个情节分别:
df.plot(subplots=True)