如何在Python的RandomForest模型中获得准确性?

时间:2019-05-01 21:24:25

标签: python linear-regression random-forest

我得到了这个脚本,该脚本可以使用RandomForest和LinearRegression预测秒数据集的值,效果很好,线性回归的准确度是18%,太糟糕了。

所以我尝试使用RandomForest,但我不知道如何计算该模型的准确性。

import pandas as pd

from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import make_regression

import numpy as np
import pandas as pd
import scipy
import matplotlib.pyplot as plt
from pylab import rcParams
import urllib
import sklearn
from sklearn.linear_model import RidgeCV, LinearRegression, Lasso

from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.model_selection import GridSearchCV

data = pd.read_csv('EncuestaVieja.csv')
X = data[['Edad','Sexo','v1','v2','v3']]
y = data['Alumna']

dataP = pd.read_csv('EncuestaVieja_test.csv')
X_p = dataP[['Edad','Sexo','v1','v2','v3']]
y_p = dataP['Alumna']

dataT = pd.read_csv('EncuestaVieja_test_2.csv')
X_t = dataT[['Edad','Sexo','v1','v2','v3']]
y_t = dataT['Alumna']
regr = linear_model.LinearRegression()

regr.fit(X, y)

lr = RandomForestRegressor(n_estimators=50)
lr.fit(X, y)

X_test = pd.read_csv('EncuestaNueva.csv')[['Edad','Sexo','v1','v2','v3']]

predictions = regr.predict(X_test)


predictions2 = lr.predict(X_test)
print( 'RandomForest Accuracy: ')
print(((predictions2)))
print( '')
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_p,y_p)
accuracy = regressor.score(X_t,y_t)
print( 'Linear Regression Accuracy: ', accuracy*100,'%')
print(((predictions)))

输出:

RandomForest Accuracy: 
[ 1.64  2.54  2.6   2.38  1.64  1.32  1.68  2.56  3.    2.28  2.38  2.68
  2.9   2.5   2.78  1.96  1.56  2.6   2.12  2.76  2.74  1.66  1.68  2.12
  2.3   2.36  2.28  2.28  2.82  1.7   1.86  2.36  1.24]

Linear Regression Accuracy:  18.1336149086 %
[ 1.2681851   1.02802219  3.13377072  2.96885127  2.30808853  1.98814349
  2.39233726  2.8638321   1.86640316  2.63073399  2.21166731  2.25201016
  1.95065189  2.65360517  3.08855254  1.01229733  2.18225606  2.41802534
  2.43539473  2.50227407  1.71105799  1.88238089  2.12152321  3.33525397
  2.72820453  2.43241713  2.88757874  2.6242382   2.63087916  1.98379487
  2.25430306  1.96810279  0.8554685 ]

2 个答案:

答案 0 :(得分:0)

我认为这是通过score()方法处理的

lr.score(x_test, y_test)

这将返回模型的R ^ 2值。在您的情况下,您似乎只有一个x_test。 请注意,这不是准确性。回归模型不像分类模型那样使用准确性。而是计算不同的度量,例如均方误差或确定系数。这些指标可以显示预测值与已知值的匹配程度如何,或者回归模型与回归线的拟合程度如何。

答案 1 :(得分:0)

mse = sklearn.metrics.mean_squared_error(actual, predicted)
rmse = math.sqrt(mse)
print('Accuracy for Random Forest',100*max(0,rmse))