Python中的多元回归

时间:2019-09-05 22:11:32

标签: python regression multivariate-testing

我想在Python中基于多个依赖数据数组和多个独立数据进行多元线性回归。

我见过很多MULTIPLE线性回归,有多个独立的输入,几乎每个人都认为multi = multivariate,但事实并非如此。我在互联网上看不到任何真正的多元教程。我想要的是多个输出+多个输入。

from pandas import DataFrame
from sklearn import linear_model
import tkinter as tk 
import statsmodels.api as sm

Stock_Market = {'Year': [2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018],
                'Agriculture': [1, 0.8965517282485962, 0.4350132942199707, 0.5384615659713745, 1.1071428582072258, 0.1071428582072258, 0.1290322244167328, -0.07096776366233826, -0.37857140600681305, -0.439440980553627, -0.2020460031926632, -0.16339869424700737, 2.277777746319771], 
                'Demand_risk':[1,0.015701416,0.638652235,0.744531459,0.630988038,0.787568771,1.796302615,1.708789548,1.897916832,1.643077606,1.579785002,2.444568612,2.626896547],
                'International_risk':[1,1.609574468,1.225836431,1.30566937,1.771415837,1.737162303,2.156292933,2.365513975,2.502820771,2.660719511,2.468833192,2.624733983,2.577283326],
                'Production_risk': [1,0.76346912,1.421097464,1.423616355,1.434009229,1.307186577,1.378837063,1.3577073,1.744395371,1.744281735,1.559044776,1.570226289,1.116485043],
                'Technology_risk': [1,1.029845201,1.042711964,1.053634438,1.038367263,0.659816279,0.90179752,1.448686704,1.836091216,1.644680334,1.413661748,1.089683923,1.191047799]        
                }


df = DataFrame(Stock_Market,columns=['Year','Agriculture','Demand_risk','International_risk','Production_risk', 'Technology_risk']) 

X = df[['Demand_risk','International_risk','Production_risk', 'Technology_risk']] # here we have 2 input variables for multiple regression. If you just want to use one variable for simple linear regression, then use X = df['Interest_Rate'] for example.Alternatively, you may add additional variables within the brackets
Y = df['Year', 'Agriculture'] # output variable (what we are trying to predict)

# with sklearn
regr = linear_model.LinearRegression()
regr.fit(X, Y)

print('Intercept: \n', regr.intercept_)
print('Coefficients: \n', regr.coef_)

# compute with statsmodels, by adding intercept manually
import statsmodels.api as sm
X1 = sm.add_constant(X)
result = sm.OLS(Y, X1).fit()
#print dir(result)
print (result.rsquared, result.rsquared_adj)

我想更改输出变量Y,以便它可以处理多个数组,而不仅仅是一个数组(现在它会引发错误)。

0 个答案:

没有答案