在R Studio中训练数据集

时间:2019-08-14 21:19:35

标签: r nan training-data

我将数据集分为70%的训练和30%的验证集。 NaN变量也很多,也许正因为如此,我无法训练我的数据。虽然,我能够将数据集区分为训练和测试数据集。但是当我想训练时,我得到了这个错误(“ na.fail.default(list(ndvi = c(0.426755102040816,0.409,0.501735849056604,:缺少对象中的值”)。

我想用NDVI估算生物量,然后查看与观察到的生物量的关系。

set.seed(123)
inTrain = createDataPartition(newdata$ndvi, p = 0.7, list = FALSE)
training = newdata[ inTrain,]
testing = newdata[-inTrain,]
cols <- c("ndvi", "first", "second", "third","DMY_kg_ha")
newdata[cols] <- lapply(newdata[cols], factor)  ## as.factor() could also be used
set.seed(32343)
modelFit<-train(DMY_kg_ha~first+second+third+treatment, data=training, method='glm',na.rm = na.omit)
modelFit

创建了模型拟合后,我想在R中使用'vif'来找出哪些变量很重要。

1 个答案:

答案 0 :(得分:0)

尝试一下

# load library
library(caret)

# set seed value
set.seed(123)

# remove NA's in data
newdata = na.omit(newdata)

# split data set
inTrain = createDataPartition(newdata$ndvi, p = 0.7, list = FALSE)
training = newdata[ inTrain,]
testing = newdata[-inTrain,]

# convert columns to factors
cols <- c("ndvi", "first", "second", "third","DMY_kg_ha")
newdata[cols] <- lapply(newdata[cols], factor)  ## as.factor() could also be used

# reset seed value
set.seed(32343)

# train model
modelFit<-train(DMY_kg_ha~first+second+third+treatment, data=training, method='glm',na.rm = na.omit)

# view model
modelFit
相关问题