Question

我正在尝试对CSV文件中的总替换成本运行岭，套索回归以及randomForest模型。

这是我做的如下：

data$TOTAL_REPLACEMENT_VALUE=log(data$TOTAL_REPLACEMENT_VALUE) 
n_total=nrow(data) 
n_train=round(n_total*0.7)
training_data=data[1:n_train,]
test_data=data[n_train+1:n_total,]
X_train_cost_model=model.matrix(TOTAL_REPLACEMENT_VALUE~TYPE,data=training_data) 
X_test_cost_model=model.matrix(TOTAL_REPLACEMENT_VALUE~TYPE,data=test_data) 
Y_train_cost=training_data[,"TOTAL_REPLACEMENT_VALUE"] 
Y_test_cost=test_data[,"TOTAL_REPLACEMENT_VALUE"]

我继续通过此方法进行岭和套索回归：

install.packages("glmnet",dependencies = TRUE)
library(glmnet) 
ridge_replacement_cost_model=cv.glmnet(X_train_cost_model,Y_train_cost,alpha=0,type.measure = "mse")
ridge_pred_replacement_cost=predict(ridge_replacement_cost_model,newx = X_test_cost_model,exact=TRUE,s="lambda.min")  
lasso_replacement_cost_model=cv.glmnet(X_train_cost_model,Y_train_cost,alpha=1,type.measure = "mse")
lasso_pred_replacement_cost=predict(lasso_replacement_cost_model,newx = X_test_cost_model,exact=TRUE,s="lambda.min") 

install.packages("randomForest")
library(randomForest)
rf_total_replacement_cost_model=randomForest(TOTAL_REPLACEMENT_VALUE~TYPE,                                                data=training_data,importance=TRUE)                                              
rf_pred_replacement_cost=predict(rf_total_replacement_cost_model,test_data,type="class")

但是，我遇到了这些错误

Error in glmnet(x, y, weights = weights, offset = offset, lambda = lambda,  :    number of observations in y (590) not equal to the number of rows of x (589)

Error in na.fail.default(list(TOTAL_REPLACEMENT_VALUE = c(18.126980599175,  : 
  missing values in object

第一个错误发生在运行ridge和套索回归后，而第二个错误发生在运行randomForest模型后。我了解类似问题上也有话题，但我不知道哪里出了问题。任何帮助都非常感谢。

岭，套索回归在cv.glmnet中的错误和randomForest中的错误

0 个答案: