如何使用LinearRegression获得重要性F,R平方?

时间:2018-10-18 06:26:03

标签: python scikit-learn linear-regression

我使用此代码对LinearRegression进行sklearn

from sklearn.linear_model import LinearRegression
import pandas as pd

def calculate_Intercept_X_Variable():
    list_a=[['2018', '3', 'aa', 'aa', 93,1884.7746222667, 165.36153386251098], ['2018', '3', 'bb', 'bb', 62, 665.6392779848, 125.30386609565328], ['2018', '3', 'cc', 'cc', 89, 580.2259903521, 160.19280253775514]]
    df = pd.DataFrame(list_a)
    X = df.iloc[:, 5]
    y = df.iloc[:, 6]
    X = X.values.reshape(-1, 1)
    y = y.values.reshape(-1, 1)
    clf = LinearRegression()
    clf.fit(X, y)
    para_Intercept = clf.intercept_[0] #133.10871357512195
    para_X_Variable_1 = clf.coef_[0][0] #0.016460552337949654
    para_Significance_F=""
    para_R_Square=""

calculate_Intercept_X_Variable()

如果我使用excel,则对以下数据使用回归分析:

X           Y 
1884.774622 165.3615339
665.639278  125.3038661
580.2259904 160.1928025

enter image description here

Excel将为我生成此类数据: enter image description here

我想使用LinearRegression来获取Significance FR Square这两个参数,就像excel一样,我标记了绿色部分。

我的代码现在在吗?

如何获取Significance FR Square这两个参数?

1 个答案:

答案 0 :(得分:0)

现在我知道该怎么做了。 这是代码:

from sklearn.linear_model import LinearRegression
import pandas as pd
import numpy as np
from scipy.stats import linregress

def calculate_Intercept_X_Variable():
    list_a=[['2018', '3', 'aa', 'aa', 93,1884.7746222667, 165.36153386251098], ['2018', '3', 'bb', 'bb', 62, 665.6392779848, 125.30386609565328], ['2018', '3', 'cc', 'cc', 89, 580.2259903521, 160.19280253775514]]
    df = pd.DataFrame(list_a)
    X = df.iloc[:, 5]
    y = df.iloc[:, 6]
    X1 = X.values.reshape(-1, 1)
    y1 = y.values.reshape(-1, 1)
    clf = LinearRegression()
    clf.fit(X1, y1)
    yhat = clf.predict(X1)
    para_Intercept = clf.intercept_[0]
    para_X_Variable_1 = clf.coef_[0][0]
    SS_Residual = sum((y1 - yhat) ** 2)
    SS_Total = sum((y1 - np.mean(y1)) ** 2)
    para_R_Square = 1 - (float(SS_Residual)) / SS_Total
    adjusted_r_squared = 1 - (1 - para_R_Square) * (len(y1) - 1) / (len(y1) - X1.shape[1] - 1)
    #para_a = linregress(X, y)
    para_a = linregress(X.astype(float), y)
    para_Significance_F = para_a[3]
    print("Intercept:"+str(para_Intercept))
    print("X_Variable_1:"+str(para_X_Variable_1))
    print("R_Square:" + str(para_R_Square[0]))
    print("Significance_F:" + str(para_Significance_F))



calculate_Intercept_X_Variable()

输出为:

  

拦截:133.10871357512195

     

X_Variable_1:0.016460552337949654

     

R_Square:0.3039426453800934

     

Significance_F:0.62825637186​​49847