如何添加带有预测的新列?

时间:2019-11-21 11:59:37

标签: python pandas time-series forecasting arima

我正在尝试使用ARIMA-Model进行预测。我的问题是,如何创建一个包含我的预测值的新列以及将来的新日期(基于将来的步骤)。这是我的代码:

import numpy as np
import pandas as pd
from pandas import datetime
import matplotlib.pylab as plt
%matplotlib inline
df = pd.read_csv("Desktop/Daten/probe.csv",sep=";")
df["Monthes"] = pd.to_datetime(dataset["Monthes"], infer_datetime_format=True)
indexedDf = df.set_index(["Monthes"])
from statsmodels.tsa.arima_model import ARIMA

model = ARIMA(indexedDf, order =(1,1,2))
results_ARIMA = model.fit(disp=0)
n = 120 # 1 year Forecasting
result = results_ARIMA.forecast(steps=n)[0]

如何使用新的'n'Monthes将Forecasting的结果放在新的Tab中? ..

2 个答案:

答案 0 :(得分:2)

假设您想将此列添加到数据框(df),这是您需要做的。

df['result`] = result

如果您想将此结果写入到工作表已重命名为结果日期的excel电子表格中,

N = [30, 60, 90, 120]
with pd.ExcelWriter('output.xlsx') as writer:
    # if you want to write multiple forecasts to 
    # the same file, but in different spreadsheets
    for n in N: 
        result = results_ARIMA.forecast(steps=n)[0]
        df['result'] = result
        df.to_excel(writer, sheet_name='Sheet_n={}'.format(n))

如果您想用明天的日期(2019-11-22)命名工作表,则只需更改sheet_name='2019-11-22'

如何获取明天的日期?

import datetime
def tomorrow():
    return datetime.date.today() + datetime.timedelta(days=1)
print(tomorrow())

日期转换为字符串:

dates.apply(lambda x: x.strftime('%Y-%m-%d'))

我鼓励您查看the documentation,以更清楚地了解pandas.ExcelWriter

答案 1 :(得分:1)

您可以这样做:

假设您的数据框如下所示:

         date  spend
0  2019-11-10    800
1  2019-11-11    800
2  2019-11-12    300
3  2019-11-13    150
4  2019-11-14    300
5  2019-11-15    500
6  2019-11-16    800
7  2019-11-17    600
8  2019-11-18    400
n = 5
t = pd.date_range(start=(df.date[len(df)-1]) , periods=n)
# assume predictions
predictions = np.random.rand(5) * 1000
# array([619.34810384, 600.78387725, 242.4680893 , 920.58391429, 489.36016082])
new_df = pd.DataFrame([[x, y] for x,y in zip(t, predictions)], columns=["date", "spend"])
print(new_df)
        date      spend
0 2019-11-19  94.944353
1 2019-11-20  64.813264
2 2019-11-21  56.319640
3 2019-11-22  81.696114
4 2019-11-23  43.533978

现在,您终于可以将其合并/附加到您的数据框中:

df = pd.concat([df, new_df]).reset_index(drop=True)

输出

         date  spend
0  2019-11-10    800
1  2019-11-11    800
2  2019-11-12    300
3  2019-11-13    150
4  2019-11-14    300
5  2019-11-15    500
6  2019-11-16    800
7  2019-11-17    600
8  2019-11-18    400
9  2019-11-19    94.944353
10 2019-11-20    64.813264
11 2019-11-21    56.319640
12 2019-11-22    81.696114
13 2019-11-23    43.533978