考虑以下Python程序:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
data = [["2017-05-25 22:00:00", 5],
["2017-05-25 22:05:00", 7],
["2017-05-25 22:10:00", 9],
["2017-05-25 22:15:00", 10],
["2017-05-25 22:20:00", 15],
["2017-05-25 22:25:00", 20],
["2017-05-25 22:30:00", 25],
["2017-05-25 22:35:00", 32]]
df = pd.DataFrame(data)
df.columns = ["date", "value"]
df["date2"] = pd.to_datetime(df["date"],format="%Y-%m-%d %H:%M:%S")
ts = pd.Series(df["value"].values, index=df["date2"])
mean_smoothed = ts.rolling(window=5).mean()
exp_smoothed = ts.ewm(alpha=0.5).mean()
h1 = ts.head(8)
h2 = mean_smoothed.head(8)
h3 = exp_smoothed.head(8)
k = pd.concat([h1, h2, h3], join='outer', axis=1)
k.columns = ["Actual", "Moving Average", "Exp Smoothing"]
print(k)
打印
Actual Moving Average Exp Smoothing
date2
2017-05-25 22:00:00 5 NaN 5.000000
2017-05-25 22:05:00 7 NaN 6.333333
2017-05-25 22:10:00 9 NaN 7.857143
2017-05-25 22:15:00 10 NaN 9.000000
2017-05-25 22:20:00 15 9.2 12.096774
2017-05-25 22:25:00 20 12.2 16.111111
2017-05-25 22:30:00 25 15.8 20.590551
2017-05-25 22:35:00 32 20.4 26.317647
绘制图表
plt.figure(figsize=(16,5))
plt.plot(ts, label="Original")
plt.plot(mean_smoothed, label="Moving Average")
plt.plot(exp_smoothed, label="Exponentially Weighted Average")
plt.legend()
plt.show()
移动平均值(MA)和指数平滑(ES)都会引入滞后:在上面的示例MA中,需要5个值来预测第6个值是什么。但是,如果查看表格,MA列中只有4个NaN值,第5个值已经是非NaN值(=第一个预测值)。
问题:如何在图表中绘制这些值,以便正确保留滞后?看看ES,它实际上更明显一点:ES应该从t = 2开始但是开始但是立即开始。
答案 0 :(得分:0)
你似乎误解了移动平均线。对于MA(5),需要5个数据点来计算。一旦你收到第5点,就可以使用第1-5点计算第5点的平均值。因此,您应该只有4个NaN。
如果您想转移数据,可以尝试:
df.shift(n) # n is an integer
将“实际”移至-1,或将所有内容移至1.
Here是它的文档。
答案 1 :(得分:0)
插值应解决问题。
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
data = [["2017-05-25 22:00:00", 5],
["2017-05-25 22:05:00", 7],
["2017-05-25 22:10:00", 9],
["2017-05-25 22:15:00", 10],
["2017-05-25 22:20:00", 15],
["2017-05-25 22:25:00", 20],
["2017-05-25 22:30:00", 25],
["2017-05-25 22:35:00", 32]]
df = pd.DataFrame(data)
df.columns = ["date", "value"]
df["date2"] = pd.to_datetime(df["date"],format="%Y-%m-%d %H:%M:%S")
ts = pd.Series(df["value"].values, index=df["date2"])
mean_smoothed = ts.rolling(window=5).mean()
###### NEW #########
mean_smoothed[0]=ts[0]
mean_smoothed.interpolate(inplace=True)
####################
exp_smoothed = ts.ewm(alpha=0.5).mean()
h1 = ts.head(8)
h2 = mean_smoothed.head(8)
h3 = exp_smoothed.head(8)
k = pd.concat([h1, h2, h3], join='outer', axis=1)
k.columns = ["Actual", "Moving Average", "Exp Smoothing"]
print(k)
plt.figure(figsize=(16,5))
plt.plot(ts, label="Original")
plt.plot(mean_smoothed, label="Moving Average")
plt.plot(exp_smoothed, label="Exponentially Weighted Average")
plt.legend()
plt.show()