用stats模型和matplotlib绘制线性方程

时间:2018-05-28 16:50:00

标签: pandas linear-regression statsmodels

我有一个名为ndvi的数据框,如下所示:

    Year  Running      NDVI
0   1984        0  0.423529
1   1984       48  0.664205
2   1984      112  0.341656
3   1985      367  0.477519
4   1985      399  0.588417
5   1986      434  0.669474
6   1986      466  0.698148
7   1987      469  0.566785
8   1987      485  0.501238
9   1988      805  0.399277
10  1989     1140  0.666282
11  1990     1492  0.606567
12  1990     1540  0.505155
13  1991     1876  0.597450
14  1992     2180  0.280612
15  1992     2276  0.498419
16  1993     2563  0.413074
17  1993     2579  0.547831
18  1994     2915  0.345050
19  1994     2931  0.460600

我正在运行这样的线性模型:

import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import summary_table

#run linear model
ndvi_x = sm.add_constant(ndvi['Running'])
ndvi_y = ndvi['NDVI']
ndvi_regr = sm.OLS(ndvi_y, ndvi_x)
ndvi_res = ndvi_regr.fit()
# Get fitted values from model to plot
ndvi_st, ndvi_data, ndvi_ss2 = summary_table(ndvi_res, alpha=0.05)
ndvi_fitted_values = ndvi_data[:,2]

#get confidence intervals
ndvi_predict_mean_ci_low, ndvi_predict_mean_ci_upp = ndvi_data[:,4:6].T

ndvi_CI_df = pd.DataFrame(columns = ['x_data', 'low_CI', 'upper_CI'])
ndvi_CI_df['x_data'] = ndvi['Year']
ndvi_CI_df['low_CI'] = ndvi_predict_mean_ci_low
ndvi_CI_df['upper_CI'] = ndvi_predict_mean_ci_upp
ndvi_CI_df.sort_values('x_data', inplace = True)


#plot the data
fig, ax = plt.subplots(figsize = (11, 6), sharey = True)
ax.scatter(ndvi['Year'], ndvi['NDVI'], color = 'black')
ax.plot(ndvi['Year'], ndvi_fitted_values, lw = 2, color = 'k')
ax.fill_between(ndvi_CI_df['x_data'], ndvi_CI_df['low_CI'], ndvi_CI_df['upper_CI'], color = 'gray', alpha = 0.4, label = '95% CI')
ax.set_xlabel("Year")
ax.set_ylabel("NDVI")

返回:

enter image description here

我不明白的是为什么最佳拟合线实际上并不是线性的,而是看起来有断裂?

1 个答案:

答案 0 :(得分:2)

回归中的解释变量是Running。因此,您的模型在此变量中将是线性的。但是,在创建绘图时,x轴表示Year