Question

我想使用线性回归预测代表# of A-type clients/ time的Y值，其中X值是时间序列数据。

代码是

 df1 = pd.DataFrame({'time': past_time_array, 'A_clients': client_A_array})
        x_a = np.arange(len(past_time_array))
        fit_A = np.polyfit(x_a, df1['A_clients'], 1)
        fit_fn_A = np.poly1d(fit_A)


        print df1
        print "fitness function = %s" %fit_fn_A

print df1的结果

   A_clients                time
0           0 2018-02-09 14:45:00
1           0 2018-02-09 14:46:00
2           1 2018-02-09 14:47:00
3           4 2018-02-09 14:48:00
4           4 2018-02-09 14:49:00
5           2 2018-02-09 14:50:00
6           2 2018-02-09 14:51:00
7           2 2018-02-09 14:52:00
8           2 2018-02-09 14:53:00
9           4 2018-02-09 14:54:00
10          1 2018-02-09 14:55:00
11          3 2018-02-09 14:56:00
12          4 2018-02-09 14:57:00
13          2 2018-02-09 14:58:00
14          4 2018-02-09 14:59:00
15          3 2018-02-09 15:00:00
16          1 2018-02-09 15:01:00
17          1 2018-02-09 15:02:00
18          0 2018-02-09 15:03:00
19          4 2018-02-09 15:04:00
20          1 2018-02-09 15:05:00
21          1 2018-02-09 15:06:00
22          4 2018-02-09 15:07:00
23          4 2018-02-09 15:08:00

print "fitness function = %s" %fit_fn_A的结果

0.0001389 x + 2.213

问题在于，当我尝试预测像

这样的值时

predicted_ta = fit_fn_A(x_a[10])
print "predicted values = %f"%predicted_ta

它总是给我2.213，c值为y = mx+c

最佳拟合线如下所示

编辑1

当我每2分钟计算#clietns而不是一个

时，回归线有一定的斜率

Answer 1

值得到了正确的预测，但是在我计算number of clients/ minute之前，该图表是线性的，如上所示。因此，当我计算number of clients/ 2 minutes的回归线时，适应度函数给出了正确的结果。

Answer 2

你不能在这里申请他的模特。完全没有依赖性。

尝试计算总结的客户端数量（值[x] = sum（值[：x]）。通常它与log（）模型非常吻合。

使用线性回归预测python中时间序列数据的y值

编辑1

2 个答案: