Question

我在R中使用createFolds函数来创建返回成功结果的折叠。但是当我使用循环对每个折叠进行一些计算时，我得到的误差低于此值。代码是：

set.seed(1000)
k <- 10
folds <- createFolds(train_data,k=k,list = TRUE, returnTrain = FALSE)
str(folds)

这是输出为：

List of 10
 $ Fold01: int [1:18687] 1 8 10 21 22 25 26 29 34 35 ...
 $ Fold02: int [1:18685] 5 11 14 32 40 46 50 52 56 58 ...
 $ Fold03: int [1:18685] 16 20 39 47 49 77 78 83 84 86 ...
 $ Fold04: int [1:18685] 3 15 30 38 41 44 51 53 54 55 ...
 $ Fold05: int [1:18685] 7 9 17 18 23 37 42 67 75 79 ...
 $ Fold06: int [1:18686] 6 31 36 48 72 74 90 113 114 121 ...
 $ Fold07: int [1:18686] 2 33 59 61 100 103 109 123 137 161 ...
 $ Fold08: int [1:18685] 24 64 68 87 88 101 110 130 141 152 ...
 $ Fold09: int [1:18684] 4 27 28 66 70 85 97 105 112 148 ...
 $ Fold10: int [1:18684] 12 13 19 43 65 91 94 108 134 138 ...

然而，下面的代码给了我错误

for( i in 1:k ){
  testData <- train_data[folds[[i]], ]
  trainData <- train_data[(-folds[[i]]), ]
}

错误是：

> for( i in 1:k ){
+   testData <- train_data[folds[[i]], ]
+   trainData <- train_data[(-folds[[i]]), ]
+ }
Error in train_data[folds[[i]], ] : subscript out of bounds

我尝试了不同的种子值，但我得到同样的错误。任何帮助表示赞赏。谢谢！

Answer 1

根据我的理解，您的问题正在出现，因为您使用整个数据框train_data来创建折叠。可以为样本生成K形折叠，即数据集的行。

例如：

data(spam) # from package kernlab
dim(spam) #has 4601 rows/samples
folds <- createFolds(y=spam$type, k=10, list=T, returnTrain = T) 
# Here, only one column , spam$type, is used 
# and indeed
max(unlist(folds)) #4601
#and these can be used as row indices
head( spam[folds[[4]], ] )

使用整个数据帧与使用矩阵非常相似。首先将这种矩阵转换为矢量。因此，5×10矩阵实际上将被转换为50个元素向量，并且折叠中的值将对应于该向量的索引。如果您尝试将这些值用作数据框的行索引，则会超出

r <- 8
c <- 10
m0 <- matrix(rnorm(r*c), r, c)
features<-apply(m0, c(1,2), function(x) sample(c(0,1),1))
features
folds<-createFolds(features,4)    
folds
max(unlist(folds)) 

m0[folds[[2]],] # Error in m0[folds[[2]], ] : subscript out of bounds

下标超出界限-R错误

1 个答案: