套索和岭回归低精度问题

时间:2019-04-29 07:36:00

标签: python machine-learning regression lasso

我在森林火灾样本数据集上应用套索回归和山脊回归,但是我的准确性太低,我不应该达到

我已经尝试过更改Alpha和训练设置值

const helperUpload = async (uploadPath, req, res,) => {
  return new Promise((resolve, reject) => {

    let upload = () => multer({
      storage: new MulterAzureStorage({
        azureStorageConnectionString: '...',
        containerName: '...',
        containerSecurity: '...',
      })
    }).single(uploadPath);

    upload(req,res,  function(err) {
      console.log("inside callback");
      if (err) {
          console.log("unable to upload");
          resolve(false);
      } else if (req.file) {
          console.log("File = "+JSON.stringify(req.file));
          console.log("uploaded");
          resolve(true);
      }
    });
  });
}

1 个答案:

答案 0 :(得分:0)

考虑您的问题:我的代码中没有任何LassoCV回归。尝试一些ElasticNetCV(l1_ratio=[.1, .5, .7, .9, .95, .99, 1])RidgeCV始终是找到合理的alpha值的良好开始。对于Ridge,LassoCV是CV算法。与ElasticNetCVRidgeCV相比,import pandas as pd import numpy as np from sklearn.preprocessing import LabelEncoder, OneHotEncoder from sklearn.compose import ColumnTransformer from sklearn.impute import SimpleImputer from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression, LassoCV, ElasticNetCV from sklearn.linear_model import Ridge, RidgeCV forest = pd.read_csv('forestfires.csv') #Coulmn ve row feaute adlarimi duzenledim forest.month.replace(('jan','feb','mar','apr','may','jun','jul','aug','sep','oct','nov','dec'),(1,2,3,4,5,6,7,8,9,10,11,12), inplace=True) forest.day.replace(('mon','tue','wed','thu','fri','sat','sun'),(1,2,3,4,5,6,7), inplace=True) # iloc indeksin sırasıyla, loc indeksin kendisiyle işlem yapmaya olanak verir.Burada indeksledim X = forest.iloc[:,0:12].values y = forest.iloc[:,12].values # 30 -70 olarak train test setlerimi ayirdim X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=3) #x-y axis trainler arasina linear regressyon kurdum lr = LinearRegression() # The cross validation algorithms: lasso_cv = LassoCV() # LassoCV will try to find the best alpha for you # ElasticNetCV will try to find the best alpha for you, for a given set of combinations of Ridge and Alpha enet_cv = ElasticNetCV() ridge_cv = RidgeCV() lr.fit(X_train, y_train) lasso_cv.fit(X_train, y_train) enet_cv.fit(X_train, y_train) ridge_cv.fit(X_train, y_train) #ridge regression modeli kurdum rr = Ridge(alpha=0.01) rr.fit(X_train, y_train) rr100 = Ridge(alpha=100) 使用LOO-CV AND 采用固定的Alpha值集,因此,需要更多的用户处理最佳输出。以下面给定的代码示例为例:

print('LassoCV alpha:', lasso_cv.alpha_)
print('RidgeCV alpha:', ridge_cv.alpha_)
print('ElasticNetCV alpha:', enet_cv.alpha_, 'ElasticNetCV l1_ratio:', enet_cv.l1_ratio_)
ridge_alpha = ridge_cv.alpha_
enet_alpha, enet_l1ratio = enet_cv.alpha_, enet_cv.l1_ratio_

现在使用以下命令检查找到的alpha值:

RdigeCV

将新的ElasticNetCV和/或l1_ratio置于这些值的中心(<0将忽略>1的{​​{1}}和ElasticNetCV ):

enet_new_l1ratios = [enet_l1ratio * mult for mult in [.9, .95, 1, 1.05, 1.1]]
ridge_new_alphas = [ridge_alpha * mult for mult in [.9, .95, 1, 1.05, 1.1]]

# fit Enet and Ridge again:
enet_cv = ElasticNetCV(l1_ratio=enet_new_l1ratios)
ridge_cv = RidgeCV(alphas=ridge_new_alphas)

enet_cv.fit(X_train, y_train)
ridge_cv.fit(X_train, y_train)

这应该是为模型找到合适的alpha值和/或l1比率的第一步。当然,其他步骤,例如特征工程和选择正确的模型(f.i. Lasso:执行特征选择),应先找到合适的参数。 :)