如何在Pandas Dataframe中扩展date_range?

时间:2018-02-08 14:39:30

标签: pandas date-range datetimeindex

有些数据每5秒收集一次,有时会丢失。

将它们加载到Pandas数据框后,我想定义一个起始时间点,精确地提取180行(15分钟x每分钟12个样本),无论起点如何。这些数据提供了一个图表,并始终保持相同的大小简化了其余代码。

缺少的数据应填充无

我认为有一些我不知道的快捷方式:

import pandas as pd
import datetime

dt = [
    "2018-02-08 13:45:05",
    "2018-02-08 13:45:10",
    "2018-02-08 13:45:25",
    "2018-02-08 13:45:30",
    "2018-02-08 13:45:35",
    "2018-02-08 13:45:40",
    "2018-02-08 13:45:50",
    "2018-02-08 13:45:55",
    "2018-02-08 13:46:00",
    "2018-02-08 13:46:05",
]

wl = [
    4737.25,
    4834.80,
    4885.53,
    5003.98,
    5031.08,
    5215.90,
    5147.65,
    5100.50,
    5038.94,
    5020.67,
]

df = pd.DataFrame({"dt":dt, "wl":wl}).set_index("dt")
df.index = pd.to_datetime(df.index)
df = df.resample("5s").mean()
print(df)

返回:

                          wl
dt                          
2018-02-08 13:45:05  4737.25
2018-02-08 13:45:10  4834.80
2018-02-08 13:45:15      NaN
2018-02-08 13:45:20      NaN
2018-02-08 13:45:25  4885.53
2018-02-08 13:45:30  5003.98
2018-02-08 13:45:35  5031.08
2018-02-08 13:45:40  5215.90
2018-02-08 13:45:45      NaN
2018-02-08 13:45:50  5147.65
2018-02-08 13:45:55  5100.50
2018-02-08 13:46:00  5038.94
2018-02-08 13:46:05  5020.67

这没关系,但是日期时间范围是由第一个和最后一个样本的日期时间定义的。

我感兴趣的日期时间范围是:

new_datetime_range = pd.date_range(start=df.index.min(), freq="5s", periods=180)
print(new_datetime_range)

最多'2018-02-08 14:00:00'

我想获得

                          wl
dt                          
2018-02-08 13:45:05  4737.25
2018-02-08 13:45:10  4834.80
2018-02-08 13:45:15      NaN
2018-02-08 13:45:20      NaN
2018-02-08 13:45:25  4885.53
2018-02-08 13:45:30  5003.98
2018-02-08 13:45:35  5031.08
2018-02-08 13:45:40  5215.90
2018-02-08 13:45:45      NaN
2018-02-08 13:45:50  5147.65
2018-02-08 13:45:55  5100.50
2018-02-08 13:46:00  5038.94
2018-02-08 13:46:05  5020.67
2018-02-08 13:46:10      Nan
2018-02-08 13:46:15      Nan
............................
2018-02-08 13:59:45      Nan
2018-02-08 13:59:50      Nan
2018-02-08 13:59:55      Nan
2018-02-08 14:00:00      Nan

怎么可能这样做?

1 个答案:

答案 0 :(得分:1)

我认为你需要reindex

df = df.resample("5s").mean().reindex(new_datetime_range)

另一种解决方案是手动将最后日期添加到index

last = pd.date_range(start=df.index.min(), freq="5s", periods=180)[-1]
df.loc[last] = np.nan
df = df.resample("5s").mean()

print(df)
                          wl
2018-02-08 13:45:05  4737.25
2018-02-08 13:45:10  4834.80
2018-02-08 13:45:15      NaN
2018-02-08 13:45:20      NaN
2018-02-08 13:45:25  4885.53
2018-02-08 13:45:30  5003.98
2018-02-08 13:45:35  5031.08
2018-02-08 13:45:40  5215.90
2018-02-08 13:45:45      NaN
2018-02-08 13:45:50  5147.65
2018-02-08 13:45:55  5100.50
2018-02-08 13:46:00  5038.94
2018-02-08 13:46:05  5020.67
2018-02-08 13:46:10      NaN
2018-02-08 13:46:15      NaN
...
...