感谢您抽出宝贵时间阅读我的问题。
我必须在我的数据集上运行Ordinal Ridge和Lasso回归。我要预测的值是序数(5个级别),并且我有许多连续的预测器(超过60个),但并不是所有逻辑上都有意义。因此,我想使用Lasso和Ridge运行Ordinal回归以找到重要的回归。
我是python的新手,我真的不知道该怎么做,并且感谢社区的任何帮助。
我找到了mord模块(即使我正确使用了它),也没有提供Ordinal Lasso。
有人可以帮我吗?
预先感谢。
更新: 我编写了以下代码,没有出现任何错误,并且准确性低于以前的分析。因此,我想我在做某件事上犯了一个错误。如果有人帮助我,我将不胜感激。我想可能是在扩展,但我不知道如何。 “ rel”具有五个值:1、2、3、4、5,这是我的预测值。
import numpy as np
import pandas as pd
import mord
from sklearn.preprocessing import scale, StandardScaler
from sklearn.metrics import mean_squared_error
import csv
#defining a function to rotate numbers in an array
def leftRotatebyOne(arr, n):
temp = arr[0]
for i in range(n-1):
arr[i] = arr[i+1]
arr[n-1] = temp
#defining OR to do Ordinal Ridge Regression
OR = mord.OrdinalRidge()
#definign the loop to go through all participants
for s in range(17):
#reading the data for each participant
df = pd.read_csv("Complete{0}.csv".format(s+1), index_col=0, header=None).dropna()
df.index.name = 'subject{0}'.format(s+1)
df.columns = ["ch{0}".format(i+1) for i in range(64)] +["irrel", "rel"]
#defining output and predictors
y = df.rel
X = df.drop(['rel', 'irrel'], axis=1).astype('float64')
#an array containig trial numbers
T = np.array(range(480))
#defining a matrix to hold the models of all runs(480 one-leave_out) for each participants
out=np.empty((67,480))
#runing the model for all trials (each time keeping one out)
for t in range(480):
T1 = T[:479]
T2 = T[479:] #the last one which is going to be out
## Always the last one is going to be out, how it works is that we rotate T, so the last trail changes
#train samples
X_train = X.iloc[T1,:]
y_train = np.array(y.iloc[T1])
scaler = StandardScaler().fit(X_train)
#test sample
X_test = X.iloc[T2,:]
y_test = np.array(y.iloc[T2])
#rotating T
leftRotatebyOne(T,480)
#runing ordinal ridge regression from the module mord
OR.fit(scaler.transform(X_train), y_train)
predicted = OR.predict(scaler.transform(X_test))
error = mean_squared_error(y_test, predicted)
coeff = pd.Series(OR.coef_, index=X.columns)
#getting the accuracy of each prediction
if predicted == y_test:
accuracy = 1
else:
accuracy = 0
#having all results in a matrix (each column is for leaving out one of the trials)
out[:,t]=np.hstack((coeff,predicted,error, accuracy))
#saving the results for each participant
np.savetxt("reg{0}.csv".format(s+1), out, delimiter=',')
#saving all results in one file
filenames = ["reg{0}.csv".format(i+1) for i in range(17)]
dataframes = [pd.read_csv(p) for p in filenames]
merged_dataframe = pd.concat(dataframes, axis=1)
merged_dataframe.to_csv("merged.csv", index=False)
#reading the file that contains all the models for all the
participants
cl = pd.read_csv("merged.csv", header=None).dropna()
#naming the rows
cl.index = ["ch{0}".format(i+1) for i in range(64)]["predicted","error","accuracy"]
#calculating the mean of each row
print(pd.Series.mean(cl, axis=1))
#getting teh mean of accuracy for each participant
for s in range(17):
regg = pd.read_csv("reg{0}.csv".format(s+1), header=None).dropna()
regg.index = ["ch{0}".format(i+1) for i in range(64)]["predicted","error","accuracy"]
print(pd.Series.mean(regg, axis=1)[66])
除了mord模块之外,我什么都没找到。 我想做一个留一法的交叉验证,我只需要保留其中一个样本进行测试。
PS。
我正在按照此链接中的说明进行操作:
http://nbviewer.jupyter.org/github/JWarmenhoven/ISL-python/blob/master/Notebooks/Chapter%206.ipynb
完全按照以下步骤操作会出现以下错误:
模块'glmnet'没有属性'ElasticNet'
*但是,它们不包括序数回归。
答案 0 :(得分:0)
您可以为此使用sklearn
from sklearn import linear_model
regr_lasso = linear_model.Lasso(alpha=0.1)
regr_ridge = linear_model.Ridge(alpha=1.0)
regr_elasticnet = linear_model.ElasticNet(random_state=0)
有关更多详细信息,请参见以下链接, http://scikit-learn.org/stable/auto_examples/linear_model/plot_lasso_coordinate_descent_path.html