在python中显示给定开始日期和结束日期的缺失时间戳

时间:2018-02-28 12:47:21

标签: python timestamp missing-data

数据集具有时间和温度值。还给出了开始和结束日期。

start_date = '22-02-2018 10:35:29'
end_date = '23-02-2018 10:34:29'

TIMESTAMP           Temp1   Temp2
22-02-2018 14:35    4.34    4.93
22-02-2018 14:36    4.35    5.02
22-02-2018 14:37    4.35    5.1
22-02-2018 14:39    4.31    5.23
22-02-2018 14:40    4.29    5.26
22-02-2018 14:41    4.26    5.24
22-02-2018 14:42    4.24    5.17
22-02-2018 14:47    4.09    4.64
22-02-2018 14:48    4.08    4.55
22-02-2018 14:49    4.08    4.48
22-02-2018 14:50    4.09    4.48
22-02-2018 14:51    4.11    4.5
22-02-2018 14:54    4.22    4.66
22-02-2018 14:55    4.25    4.72

缺少开始日期和结束日期的时间戳。所以我想在temp1和temp2数据中添加与NAN值相对应的起始数据和结束日期的时间戳。频率为60S

期望的结果:

TIMESTAMP           Temp1 Temp2
22-02-2018 10:35    NaN NaN
22-02-2018 10:36    NaN NaN
22-02-2018 10:37    NaN NaN
22-02-2018 10:38    NaN NaN
22-02-2018 10:39    NaN NaN
22-02-2018 10:40    NaN NaN
22-02-2018 10:41    NaN NaN
22-02-2018 10:42    NaN NaN
.
.
.
22-02-2018 14:35    4.34    4.93
22-02-2018 14:36    4.35    5.02
22-02-2018 14:37    4.35    5.1
22-02-2018 14:38    NaN     NaN
22-02-2018 14:39    4.31    5.23
22-02-2018 14:40    4.29    5.26
22-02-2018 14:41    4.26    5.24
22-02-2018 14:42    4.24    5.17
22-02-2018 14:43    NaN NaN
22-02-2018 14:44    NaN NaN
22-02-2018 14:45    NaN NaN
22-02-2018 14:46    NaN NaN
22-02-2018 14:47    4.09    4.64
22-02-2018 14:48    4.08    4.55
22-02-2018 14:49    4.08    4.48

1 个答案:

答案 0 :(得分:0)

我认为您首先需要DatetimeIndex,然后reindex需要date_range - 必须对齐值,因此添加floor以获得精确度:

print (df.index)
DatetimeIndex(['2018-02-22 14:35:00', '2018-02-22 14:36:00',
               '2018-02-22 14:37:00', '2018-02-22 14:39:00',
               '2018-02-22 14:40:00', '2018-02-22 14:41:00',
               '2018-02-22 14:42:00', '2018-02-22 14:47:00',
               '2018-02-22 14:48:00', '2018-02-22 14:49:00',
               '2018-02-22 14:50:00', '2018-02-22 14:51:00',
               '2018-02-22 14:54:00', '2018-02-22 14:55:00'],
              dtype='datetime64[ns]', name='TIMESTAMP', freq=None)


dates = pd.date_range(start_date, end_date, freq='60S').floor('60S')
df = df.reindex(dates)

验证更改的开始和结束日期:

start_date = '22-02-2018 14:32:29'
end_date = '22-02-2018 14:58:29'

dates = pd.date_range(start_date, end_date, freq='60S').floor('60S')
df = df.reindex(dates)
print (df)
                     Temp1  Temp2
2018-02-22 14:32:00    NaN    NaN
2018-02-22 14:33:00    NaN    NaN
2018-02-22 14:34:00    NaN    NaN
2018-02-22 14:35:00   4.34   4.93
2018-02-22 14:36:00   4.35   5.02
2018-02-22 14:37:00   4.35   5.10
2018-02-22 14:38:00    NaN    NaN
2018-02-22 14:39:00   4.31   5.23
2018-02-22 14:40:00   4.29   5.26
2018-02-22 14:41:00   4.26   5.24
2018-02-22 14:42:00   4.24   5.17
2018-02-22 14:43:00    NaN    NaN
2018-02-22 14:44:00    NaN    NaN
2018-02-22 14:45:00    NaN    NaN
2018-02-22 14:46:00    NaN    NaN
2018-02-22 14:47:00   4.09   4.64
2018-02-22 14:48:00   4.08   4.55
2018-02-22 14:49:00   4.08   4.48
2018-02-22 14:50:00   4.09   4.48
2018-02-22 14:51:00   4.11   4.50
2018-02-22 14:52:00    NaN    NaN
2018-02-22 14:53:00    NaN    NaN
2018-02-22 14:54:00   4.22   4.66
2018-02-22 14:55:00   4.25   4.72
2018-02-22 14:56:00    NaN    NaN
2018-02-22 14:57:00    NaN    NaN
2018-02-22 14:58:00    NaN    NaN

另一种解决方案,特别是如果重复的值是添加开始和结束日期时间,而resample具有一些聚合函数,例如sum

df.loc[pd.to_datetime(start_date)] = np.nan
df.loc[pd.to_datetime(end_date)] = np.nan
df = df.resample('60S').sum()