Question

感谢您抽出宝贵时间阅读我的问题。
我必须在我的数据集上运行Ordinal Ridge和Lasso回归。我要预测的值是序数（5个级别），并且我有许多连续的预测器（超过60个），但并不是所有逻辑上都有意义。因此，我想使用Lasso和Ridge运行Ordinal回归以找到重要的回归。我是python的新手，我真的不知道该怎么做，并且感谢社区的任何帮助。我找到了mord模块（即使我正确使用了它），也没有提供Ordinal Lasso。有人可以帮我吗？预先感谢。

更新：我编写了以下代码，没有出现任何错误，并且准确性低于以前的分析。因此，我想我在做某件事上犯了一个错误。如果有人帮助我，我将不胜感激。我想可能是在扩展，但我不知道如何。 “ rel”具有五个值：1、2、3、4、5，这是我的预测值。

import numpy as np
import pandas as pd
import mord
from sklearn.preprocessing import scale, StandardScaler
from sklearn.metrics import mean_squared_error
import csv

#defining a function to rotate numbers in an array 
def leftRotatebyOne(arr, n):
    temp = arr[0]
    for i in range(n-1):
        arr[i] = arr[i+1]
    arr[n-1] = temp

#defining OR to do Ordinal Ridge Regression    
OR = mord.OrdinalRidge()

#definign the loop to go through all participants
for s in range(17):

    #reading the data for each participant
    df = pd.read_csv("Complete{0}.csv".format(s+1), index_col=0, header=None).dropna()
    df.index.name = 'subject{0}'.format(s+1)
    df.columns = ["ch{0}".format(i+1) for i in range(64)] +["irrel", "rel"]
    #defining output and predictors
    y = df.rel
    X = df.drop(['rel', 'irrel'], axis=1).astype('float64')

    #an array containig trial numbers
    T = np.array(range(480))

    #defining a matrix to hold the models of all runs(480 one-leave_out) for each participants
    out=np.empty((67,480))

    #runing the model for all trials (each time keeping one out)
    for t in range(480):

        T1 = T[:479]
        T2 = T[479:]   #the last one which is going to be out

        ## Always the last one is going to be out, how it works is that we rotate T, so the last trail changes

        #train samples
        X_train = X.iloc[T1,:]
        y_train = np.array(y.iloc[T1])

        scaler = StandardScaler().fit(X_train)

        #test sample
        X_test = X.iloc[T2,:]
        y_test = np.array(y.iloc[T2])

        #rotating T
        leftRotatebyOne(T,480)

        #runing ordinal ridge regression from the module mord
        OR.fit(scaler.transform(X_train), y_train)
        predicted = OR.predict(scaler.transform(X_test))
        error = mean_squared_error(y_test, predicted)
        coeff = pd.Series(OR.coef_, index=X.columns)

        #getting the accuracy of each prediction
        if predicted == y_test:
            accuracy = 1
        else:
            accuracy = 0

        #having all results in a matrix (each column is for leaving out one of the trials)
        out[:,t]=np.hstack((coeff,predicted,error, accuracy))

    #saving the results for each participant 
    np.savetxt("reg{0}.csv".format(s+1), out, delimiter=',')

 #saving all results in one file
filenames = ["reg{0}.csv".format(i+1) for i in range(17)]
dataframes = [pd.read_csv(p) for p in filenames]
merged_dataframe = pd.concat(dataframes, axis=1)
merged_dataframe.to_csv("merged.csv", index=False)

#reading the file that contains all the models for all the 
participants
cl = pd.read_csv("merged.csv", header=None).dropna()

#naming the rows
cl.index = ["ch{0}".format(i+1) for i in range(64)]["predicted","error","accuracy"]

#calculating the mean of each row
print(pd.Series.mean(cl, axis=1))

#getting teh mean of accuracy for each participant
for s in range(17):
    regg = pd.read_csv("reg{0}.csv".format(s+1), header=None).dropna()
    regg.index = ["ch{0}".format(i+1) for i in range(64)]["predicted","error","accuracy"]

    print(pd.Series.mean(regg, axis=1)[66])

除了mord模块之外，我什么都没找到。我想做一个留一法的交叉验证，我只需要保留其中一个样本进行测试。

PS。我正在按照此链接中的说明进行操作：
http://nbviewer.jupyter.org/github/JWarmenhoven/ISL-python/blob/master/Notebooks/Chapter%206.ipynb
完全按照以下步骤操作会出现以下错误：
模块'glmnet'没有属性'ElasticNet'

*但是，它们不包括序数回归。

Answer 1

您可以为此使用sklearn

from sklearn import linear_model

regr_lasso = linear_model.Lasso(alpha=0.1)

regr_ridge = linear_model.Ridge(alpha=1.0)

regr_elasticnet = linear_model.ElasticNet(random_state=0)

有关更多详细信息，请参见以下链接， http://scikit-learn.org/stable/auto_examples/linear_model/plot_lasso_coordinate_descent_path.html

Python中的Ordinal Ridge和Lasso回归

1 个答案: