数据集具有时间和温度值。还给出了开始和结束日期。
start_date = '22-02-2018 10:35:29'
end_date = '23-02-2018 10:34:29'
TIMESTAMP Temp1 Temp2
22-02-2018 14:35 4.34 4.93
22-02-2018 14:36 4.35 5.02
22-02-2018 14:37 4.35 5.1
22-02-2018 14:39 4.31 5.23
22-02-2018 14:40 4.29 5.26
22-02-2018 14:41 4.26 5.24
22-02-2018 14:42 4.24 5.17
22-02-2018 14:47 4.09 4.64
22-02-2018 14:48 4.08 4.55
22-02-2018 14:49 4.08 4.48
22-02-2018 14:50 4.09 4.48
22-02-2018 14:51 4.11 4.5
22-02-2018 14:54 4.22 4.66
22-02-2018 14:55 4.25 4.72
缺少开始日期和结束日期的时间戳。所以我想在temp1和temp2数据中添加与NAN值相对应的起始数据和结束日期的时间戳。频率为60S
。
期望的结果:
TIMESTAMP Temp1 Temp2
22-02-2018 10:35 NaN NaN
22-02-2018 10:36 NaN NaN
22-02-2018 10:37 NaN NaN
22-02-2018 10:38 NaN NaN
22-02-2018 10:39 NaN NaN
22-02-2018 10:40 NaN NaN
22-02-2018 10:41 NaN NaN
22-02-2018 10:42 NaN NaN
.
.
.
22-02-2018 14:35 4.34 4.93
22-02-2018 14:36 4.35 5.02
22-02-2018 14:37 4.35 5.1
22-02-2018 14:38 NaN NaN
22-02-2018 14:39 4.31 5.23
22-02-2018 14:40 4.29 5.26
22-02-2018 14:41 4.26 5.24
22-02-2018 14:42 4.24 5.17
22-02-2018 14:43 NaN NaN
22-02-2018 14:44 NaN NaN
22-02-2018 14:45 NaN NaN
22-02-2018 14:46 NaN NaN
22-02-2018 14:47 4.09 4.64
22-02-2018 14:48 4.08 4.55
22-02-2018 14:49 4.08 4.48
答案 0 :(得分:0)
我认为您首先需要DatetimeIndex
,然后reindex
需要date_range
- 必须对齐值,因此添加floor
以获得精确度:
print (df.index)
DatetimeIndex(['2018-02-22 14:35:00', '2018-02-22 14:36:00',
'2018-02-22 14:37:00', '2018-02-22 14:39:00',
'2018-02-22 14:40:00', '2018-02-22 14:41:00',
'2018-02-22 14:42:00', '2018-02-22 14:47:00',
'2018-02-22 14:48:00', '2018-02-22 14:49:00',
'2018-02-22 14:50:00', '2018-02-22 14:51:00',
'2018-02-22 14:54:00', '2018-02-22 14:55:00'],
dtype='datetime64[ns]', name='TIMESTAMP', freq=None)
dates = pd.date_range(start_date, end_date, freq='60S').floor('60S')
df = df.reindex(dates)
验证更改的开始和结束日期:
start_date = '22-02-2018 14:32:29'
end_date = '22-02-2018 14:58:29'
dates = pd.date_range(start_date, end_date, freq='60S').floor('60S')
df = df.reindex(dates)
print (df)
Temp1 Temp2
2018-02-22 14:32:00 NaN NaN
2018-02-22 14:33:00 NaN NaN
2018-02-22 14:34:00 NaN NaN
2018-02-22 14:35:00 4.34 4.93
2018-02-22 14:36:00 4.35 5.02
2018-02-22 14:37:00 4.35 5.10
2018-02-22 14:38:00 NaN NaN
2018-02-22 14:39:00 4.31 5.23
2018-02-22 14:40:00 4.29 5.26
2018-02-22 14:41:00 4.26 5.24
2018-02-22 14:42:00 4.24 5.17
2018-02-22 14:43:00 NaN NaN
2018-02-22 14:44:00 NaN NaN
2018-02-22 14:45:00 NaN NaN
2018-02-22 14:46:00 NaN NaN
2018-02-22 14:47:00 4.09 4.64
2018-02-22 14:48:00 4.08 4.55
2018-02-22 14:49:00 4.08 4.48
2018-02-22 14:50:00 4.09 4.48
2018-02-22 14:51:00 4.11 4.50
2018-02-22 14:52:00 NaN NaN
2018-02-22 14:53:00 NaN NaN
2018-02-22 14:54:00 4.22 4.66
2018-02-22 14:55:00 4.25 4.72
2018-02-22 14:56:00 NaN NaN
2018-02-22 14:57:00 NaN NaN
2018-02-22 14:58:00 NaN NaN
另一种解决方案,特别是如果重复的值是添加开始和结束日期时间,而resample
具有一些聚合函数,例如sum
:
df.loc[pd.to_datetime(start_date)] = np.nan
df.loc[pd.to_datetime(end_date)] = np.nan
df = df.resample('60S').sum()