针对kNN优化R中的循环

时间:2017-04-28 07:36:50

标签: r for-loop optimization knn

我已经制作了这个定制的k-Nearest Neighbors算法(k = 1),它也可用于分类变量(char,string)。但是,就性能而言,它很慢,因为它使用for循环。有没有办法优化这个循环,以便它可以更好地执行?

for (i in 1:nrow(test.data)) {
  a <- c(test.data[i,1:ncol(training.data)-1])

  final <- t((t(b) == a) * value)
  final[is.na(final)] <- 0
  sum.value <- rowSums(final)

  final1 <- cbind(training.data, sum.value)
  final1 <- final1[order(-sum.value),]
  final1 <- final1[final1$sum.value > 0,]

  suggestion <- unique(final1[,ncol(training.data)])

  if (length(unique(training.data[sum.value == max.val, ncol(training.data)])) < 5) {
    suggestion <- suggestion[1:5]
    output.line <- final1[!duplicated(final1$label),1:ncol(training.data)]
    output.line <- output.line[1:5,]
  } else {
    suggestion <- unique(training.data[sum.value == max.val, ncol(training.data)])
    output.line <- unique(training.data[sum.value == max.val,])
  }
  output <- rbind.fill(output,data.frame(t(c(test.ID = test.data[i,ncol(training.data),], suggestion))))

  sc <- t(t(output.line[,1:ncol(training.data)-1]) == a)
  sc[is.na(sc)] <- 0
  sc <- rowSums(sc)

  output.line <- cbind(output.line,sc)
  output.line <- rbind.fill(cbind(test.data[i,],sc = 0),output.line)
  output2 <- rbind(output2,output.line)
}

变量如下:

training.data = ~21 columns of data (with numbers and strings)
test.data = ~20% of training.data
label = obviously the label column
b = training.data without the label column

这使用plyr包,因为它具有rbind.fill功能。

希望你能帮忙解决这个问题。谢谢!

2 个答案:

答案 0 :(得分:0)

尽管它应该完成工作但未经过测试

output2 <- apply(test.data, 1, function(a) {
  final <- t((t(b) == a) * value)
  final[is.na(final)] <- 0
  sum.value <- rowSums(final)

  final1 <- cbind(training.data, sum.value)
  final1 <- final1[order(-sum.value),]
  final1 <- final1[final1$sum.value > 0,]

  suggestion <- unique(final1[,ncol(training.data)])

  if (length(unique(training.data[sum.value == max.val, ncol(training.data)])) < 5) {
    suggestion <- suggestion[1:5]
    output.line <- final1[!duplicated(final1$label),1:ncol(training.data)]
    output.line <- output.line[1:5,]
  } else {
    suggestion <- unique(training.data[sum.value == max.val, ncol(training.data)])
    output.line <- unique(training.data[sum.value == max.val,])
  }
  output <- rbind.fill(output,data.frame(t(c(a, suggestion))))

  sc <- t(t(output.line[,1:(ncol(training.data)-1)]) == a)
  sc[is.na(sc)] <- 0
  sc <- rowSums(sc)

  output.line <- cbind(output.line,sc)
  output.line <- rbind.fill(cbind(row,sc = 0),output.line)
})

答案 1 :(得分:0)

我自己没有实现此解决方案,但本文展示了一种避免循环的方法。

https://dzone.com/articles/improved-r-implementation-of-collaborative-filteri-1