此问题与train() in caret package returns an error about names & gsub具有相同的症状,但就我所见,所描述的解决方案并不适用于此处。
我有8个二项式目标变量和12个预测变量(实际上,577个预测变量,但我在这个最小的例子中包括了12个)。所有预测变量都具有相同数量的阳性病例:
> require(caret)
> head(eg.data)
bottle cat chair face house scissors scrambledpix shoe X1
1 0 0 0 0 0 1 0 0 1.282427535
2 0 0 0 0 0 1 0 0 2.580423598
3 0 0 0 0 0 1 0 0 2.757994797
4 0 0 0 0 0 1 0 0 2.027544189
5 0 0 0 0 0 1 0 0 2.011910591
6 0 0 0 0 0 1 0 0 1.381372427
X2 X3 X4 X5 X6
1 0.56927127535 -0.41445500589 0.05883449623 1.325428161 3.009461590
2 0.99142631615 -0.29943837061 0.07639494922 1.523704820 2.827368769
3 2.03652352150 -0.17050305555 -0.31151493933 1.573253408 2.678044808
4 1.25721256063 -0.13619253754 0.51253133255 2.577229617 1.928547094
5 -0.08773097125 0.06366970261 0.39996831088 1.887088568 1.946206958
6 -0.25631254599 -0.02384295467 0.46782728851 1.200404398 1.325037590
X7 X8 X9 X10
1 0.06590922936 0.6734459904 -0.5028127515 -0.88796906295
2 1.74129314357 -0.7760203940 0.2435879550 0.96297913339
3 2.33400909898 0.0439339562 1.0221119115 0.07875704254
4 2.65188422088 -0.1230319426 1.6562415384 0.18348716525
5 1.69440143996 0.6049393761 1.0446174220 0.87828319489
6 1.43499026729 -0.2976883919 0.7316561774 0.43665437272
X11 X12
1 -1.7844737347 -2.1649063167
2 0.2034972031 -1.7478010604
3 0.9186460991 -0.3217861157
4 1.3983604989 -1.4887151593
5 1.0934001840 -1.8538057112
6 0.8168093363 -0.6653136097
> #all columns have different values specified
> apply(eg.data,2,function(col){return(length(unique(col)))})
bottle cat chair face house
2 2 2 2 2
scissors scrambledpix shoe X1 X2
2 2 2 864 864
X3 X4 X5 X6 X7
864 863 864 864 864
X8 X9 X10 X11 X12
864 864 863 864 863
> apply(eg.data[,1:8],2,table)
bottle cat chair face house scissors scrambledpix shoe
0 756 756 756 756 756 756 756 756
1 108 108 108 108 108 108 108 108
然后我尝试从插入符号运行train()
。我最终希望使用neuralnet
方法(我认为)要求将目标变量格式化为二项式变量的集合,但是现在我只是尝试svmLinear
,这很有效对于测试用例。
> res <- train(train.formula,
+ eg.data,
+ method = "svmLinear",
+ trControl = trainControl(method="cv", number=10))
Error in cut.default(y, unique(quantile(y, probs = seq(0, 1, length = cuts))), :
invalid number of intervals
正在运行traceback
:
> traceback()
8: stop("invalid number of intervals")
7: cut.default(y, unique(quantile(y, probs = seq(0, 1, length = cuts))),
include.lowest = TRUE)
6: cut(y, unique(quantile(y, probs = seq(0, 1, length = cuts))),
include.lowest = TRUE)
5: createFolds(y, trControl$number, returnTrain = TRUE)
4: train.default(x, y, weights = w, ...)
3: train(x, y, weights = w, ...)
2: train.formula(train.formula, eg.data, method = "svmLinear", trControl = trainControl(method = "cv",
number = 10))
1: train(train.formula, eg.data, method = "svmLinear", trControl = trainControl(method = "cv",
number = 10))
正如您所看到的,它似乎是一个类似于之前报道的问题,但这里肯定没有NA
或NaN
值,所以我不知道解决方案会是什么是