我已经制作了这个定制的k-Nearest Neighbors算法(k = 1),它也可用于分类变量(char,string)。但是,就性能而言,它很慢,因为它使用for
循环。有没有办法优化这个循环,以便它可以更好地执行?
for (i in 1:nrow(test.data)) {
a <- c(test.data[i,1:ncol(training.data)-1])
final <- t((t(b) == a) * value)
final[is.na(final)] <- 0
sum.value <- rowSums(final)
final1 <- cbind(training.data, sum.value)
final1 <- final1[order(-sum.value),]
final1 <- final1[final1$sum.value > 0,]
suggestion <- unique(final1[,ncol(training.data)])
if (length(unique(training.data[sum.value == max.val, ncol(training.data)])) < 5) {
suggestion <- suggestion[1:5]
output.line <- final1[!duplicated(final1$label),1:ncol(training.data)]
output.line <- output.line[1:5,]
} else {
suggestion <- unique(training.data[sum.value == max.val, ncol(training.data)])
output.line <- unique(training.data[sum.value == max.val,])
}
output <- rbind.fill(output,data.frame(t(c(test.ID = test.data[i,ncol(training.data),], suggestion))))
sc <- t(t(output.line[,1:ncol(training.data)-1]) == a)
sc[is.na(sc)] <- 0
sc <- rowSums(sc)
output.line <- cbind(output.line,sc)
output.line <- rbind.fill(cbind(test.data[i,],sc = 0),output.line)
output2 <- rbind(output2,output.line)
}
变量如下:
training.data = ~21 columns of data (with numbers and strings)
test.data = ~20% of training.data
label = obviously the label column
b = training.data without the label column
这使用plyr
包,因为它具有rbind.fill
功能。
希望你能帮忙解决这个问题。谢谢!
答案 0 :(得分:0)
尽管它应该完成工作但未经过测试
output2 <- apply(test.data, 1, function(a) {
final <- t((t(b) == a) * value)
final[is.na(final)] <- 0
sum.value <- rowSums(final)
final1 <- cbind(training.data, sum.value)
final1 <- final1[order(-sum.value),]
final1 <- final1[final1$sum.value > 0,]
suggestion <- unique(final1[,ncol(training.data)])
if (length(unique(training.data[sum.value == max.val, ncol(training.data)])) < 5) {
suggestion <- suggestion[1:5]
output.line <- final1[!duplicated(final1$label),1:ncol(training.data)]
output.line <- output.line[1:5,]
} else {
suggestion <- unique(training.data[sum.value == max.val, ncol(training.data)])
output.line <- unique(training.data[sum.value == max.val,])
}
output <- rbind.fill(output,data.frame(t(c(a, suggestion))))
sc <- t(t(output.line[,1:(ncol(training.data)-1)]) == a)
sc[is.na(sc)] <- 0
sc <- rowSums(sc)
output.line <- cbind(output.line,sc)
output.line <- rbind.fill(cbind(row,sc = 0),output.line)
})
答案 1 :(得分:0)
我自己没有实现此解决方案,但本文展示了一种避免循环的方法。
https://dzone.com/articles/improved-r-implementation-of-collaborative-filteri-1