Question

我有一个数据集，其中包含24小时窗口的到达时间。通过绘制数据，我注意到它似乎是大致线性的。我想将其转换为python中的生成器。

绘制的数据：https://i.imgur.com/i2CJhtY.png

Arrival data in min of day:
287.73
302.17
318.03
357.66
389.87
392.82
395.99
406.47
446.29
466.47
...
1341.88
1342.17
1348.14
1348.76
1369.15
1384.69
1390.71
Day ends at 1440mins

如何将此24h数据转换为一个生成器函数，以便即使在原始24h周期之后，每次调用该函数时都可以生成下一个到达时间？它不必是线性函数，也可以是指数函数。

我当时正在考虑将数据外推到大型df集中，但这似乎在内存上效率很低，而且我也不确定如何以这种方式外推。

def generator(df):
    for index, row in df.iterrows():
        yield row['time']

Answer 1

（已编辑：改进了生成器的输出格式）

您可能希望对此进行线性回归。

这个想法是在数据集上训练模型以学习该功能。这称为“拟合”模型。然后，您可以使用此模型预测未来的到达时间。

这里是一个例子：

我将假设您的数据框如下所示：

    0
0   20.130714
1   37.598029
2   46.015164
3   52.042456
4   64.218346
5   58.528393
....
145 1296.520794
146 1320.282179
147 1327.387859
148 1326.318235
149 1337.973246

第一列是索引，第二列是到达时间。

from sklearn.linear_model import LinearRegression

def generator(df):
    X = df.index.values.reshape(-1,1)
    y = df.values

    reg = LinearRegression().fit(X,y)

    i=1
    while True:
        yield np.asscalar(reg.predict(X[-1].reshape(1,-1)+i))
        i+=1

测试：

for i,v in enumerate(generator(df)):
    print(i,v)
    if i == 10:
        break

收益：

0 1358.2313994853853
1 1367.2216112477986
2 1376.2118230102114
3 1385.2020347726243
4 1394.1922465350372
5 1403.1824582974505
6 1412.1726700598633
7 1421.1628818222762
8 1430.1530935846895
9 1439.1433053471023
10 1448.1335171095152

将短时间内的到达数据转换为生成器函数

1 个答案: