Question

Nearest_Centroid <- function(X_train, X_test, Y_train){
  names(X_test) = names(X_train)
  results = matrix(0, nrow(X_test), 10)
  indexindex = list(c(), c(), c(), c(), c(), c(), c(), c())
  neighbors = list(c(), c(), c(), c(), c(), c(), c(), c())
  for(i in 1:10){
    indexindex[[i]] <- X_train[which(Y_train == (i-1)), ]
    neighbors[[i]] <- get.knnx(indexindex[[i]], X_test,  k=10, algorithm=c("kd_tree"))$nn.index[,-1]
  }
  for(i in 1:nrow(X_test)){
    point_mat <- matrix(0, 10, 10)
    for(k in 1:10){
      for(l in 1:10){
        candidate = apply(indexindex[[l]][neighbors[[l]],], 2, mean)
        point_mat[l, k] = sqrt(sum((X_test[i,] - candidate)^2))
      }
      results[i, k] = max(which(point_mat[, k] == min(point_mat[,k]))) - 1
    }
  }
  return(results)
}



dd <- Nearest_Centroid(train[,-257], test[,-257], train[, 257])

我发现找到最近的邻居需要花费一些时间，但是train只是7290X256矩阵，而test只是2007x256矩阵。我已经将此代码运行了两天。根据我的粗略估计，它应该在一夜之间完成（我将测试缩减为1:20行，并且花费了大约一分钟的时间，这就是我用于“粗略估计”的时间）

（我的数据是USPS数字识别器数据集（类似于MNSIST））

我用python编写了相同的函数，并在几个小时内完成。我在做错什么吗？

-盘龙-雷克斯

编辑：我做了一个循环以选择几个点来测量运行时间。我将它们变成一个线性模型，发现2007年测试点的预期运行时间应该约为1.8小时。（与我的python代码紧密匹配）

最近的质心函数花费的时间比预期的长

0 个答案: