多项式拟合不会高度绘制

时间:2017-07-12 14:23:37

标签: python numpy matplotlib

我现在正在使用回归并尝试将多项式模型拟合到我的数据中,其中包含3个不同的度数,并且它仅绘制最低度数。我不知道我哪里出错了。这是我的代码和数据点:

# -*- coding: utf-8 -*-
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as dt
import time

def main():
    inputfile = "statistics.txt"
    X= np.loadtxt(inputfile, delimiter=",",dtype=np.str, usecols=[0])
    X= [dt.datetime.strptime(date, '%d.%m.%Y') for date in X]
    X = mdates.date2num(X)
    Y= np.loadtxt(inputfile, delimiter=",", usecols=[1])
    num_training = int(0.9*len(X))
    num_test = len(X) - num_training
    X_train, Y_train = X[:num_training], Y[:num_training]
    X_test, Y_test = X[num_training:], Y[num_training:]
    plt.scatter(X_train, Y_train, color="blue",s=10, marker='o')
    plt.title("Euro Swedish Krona Exchange rate")
    plt.xlabel("Time in months from April to June in 2017")
    plt.ylabel("Exhange rate")
    colors = ['teal', 'yellowgreen', 'gold']
    for count, degree in enumerate([2, 3, 4]):
        coeffs = np.polyfit(X_train, Y_train, degree)
        f = np.poly1d(coeffs)
        x_line = np.linspace(X[0], X[-1], 50)
        x_line_plot = mdates.num2date(x_line)
        y_line = f(x_line)
        plt.plot(x_line_plot, y_line, color=colors[count], linewidth=2, label="degree {}".format(degree))
        print(coeffs)
    plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%b'))
    plt.gca().xaxis.set_major_locator(mdates.MonthLocator())    
    plt.grid()
    plt.legend(loc='upper left')
    plt.show()

if __name__ == '__main__':
    main()

- - - - - - - - statistics.txt

11.04.2017,9.6059
12.04.2017,9.5741
13.04.2017,9.5976
14.04.2017,9.5892
17.04.2017,9.5763
18.04.2017,9.6101
19.04.2017,9.6107
20.04.2017,9.6309
21.04.2017,9.6611
24.04.2017,9.6266
25.04.2017,9.5858
26.04.2017,9.5551
27.04.2017,9.6070
28.04.2017,9.6474
01.05.2017,9.6438
02.05.2017,9.6220
03.05.2017,9.6326
04.05.2017,9.7007
05.05.2017,9.6669
08.05.2017,9.6616
09.05.2017,9.6649
10.05.2017,9.6974
11.05.2017,9.6489
12.05.2017,9.6480
15.05.2017,9.6903
16.05.2017,9.7402
17.05.2017,9.7432
18.05.2017,9.7797
19.05.2017,9.7800
22.05.2017,9.7683
23.05.2017,9.7363
24.05.2017,9.7255
25.05.2017,9.7378
26.05.2017,9.7233
29.05.2017,9.7138
30.05.2017,9.7580
31.05.2017,9.7684
01.06.2017,9.7402
02.06.2017,9.7256
05.06.2017,9.7388
06.06.2017,9.7707
07.06.2017,9.7833
08.06.2017,9.7685
09.06.2017,9.7579
12.06.2017,9.7980
13.06.2017,9.7460
14.06.2017,9.7634
15.06.2017,9.7540
16.06.2017,9.7510
19.06.2017,9.7475
20.06.2017,9.7789
21.06.2017,9.7676
22.06.2017,9.7581
23.06.2017,9.7629
26.06.2017,9.7537
27.06.2017,9.7647
28.06.2017,9.7213
29.06.2017,9.6806
30.06.2017,9.6309
03.07.2017,9.6479
04.07.2017,9.6740
05.07.2017,9.6332
06.07.2017,9.6457
07.07.2017,9.6084
10.07.2017,9.6101
11.07.2017,9.6299

我认为这与日期有关,因为我的情节工作没有日期。我的代码中也可能存在太多内容。同样奇怪的是,通过改变度数值,我有时得到3条曲线,有时只有1条。

1 个答案:

答案 0 :(得分:2)

正如@DavidG在评论中指出的那样,三条曲线非常接近,所以除非你放大,否则它们看起来一样。

这只是问题的症状。您可能已经注意到运行代码时打印的警告。这些表明polyfit中出现了数字问题。您的X值相对较大且非常接近。显然,它们足以导致polyfit出现问题。避免这种情况的一种方法是从X值中减去均值,因此它们以0为中心。(您也可以考虑将移位数据除以其标准偏差。这种组合移位和缩放称为美白。在这种情况下,只需移动数据即可。)

这是修改版本的拟合和绘图代码,可以实现X值的这种转换(我也调整了颜色和样式):

colors = ['teal', 'darkgreen', 'black']
markers = ['-', ':', '--']
alphas = [1, 1, 0.25]
mu = X_train.mean()
for count, degree in enumerate([2, 3, 4]):
    coeffs = np.polyfit(X_train - mu, Y_train, degree)
    f = np.poly1d(coeffs)
    x_line = np.linspace(X[0], X[-1], 50) 
    x_line_plot = mdates.num2date(x_line)
    y_line = f(x_line - mu)
    plt.plot(x_line_plot, y_line, markers[count], color=colors[count],
             linewidth=1+2*(count==2), alpha=alphas[count],
             label="degree {}".format(degree))
    print(coeffs)

事实证明3度和4度曲线仍然接近,但它们与2度曲线完全不同:

plot