我想问一下是否可以将此功能应用于 data.table 方法:
myfunction <- function(i) {
a <- test.dt[i, 1:21, with = F]
final <- t((t(b) == a) * value)
final[is.na(final)] <- 0
sum.value <- rowSums(final)
final1 <- cbind(train.dt, sum.value)
final1 <- final1[order(-sum.value),]
final1 <- final1[final1$sum.value > 0,]
suggestion <- unique(final1[, 22, with = F])
suggestion <- suggestion[1:5, ]
return(suggestion)
}
这是我在字符列上使用的自定义kNN函数。它提供了前5个建议/预测。但是,如果在大型测试数据上执行它,我的性能问题就会出现(到目前为止我无法自行调整)。
使用的变量如下:
train.dt -- the training data, includes 22 columns (21 features, 1 label column)
test.dt -- the test data, same structure as training data
value -- a vector that contains the weights/importance value of 21 features
sum.value -- sum of all the weights on value vector (sum(value))
b -- has the same data as the training data, but excluding the label column
a -- has the same data as the test data, but excluding the label column
suggestion -- the output
另外,我想在此函数上使用lapply(或任何适当的apply系列),函数中的i variable
与测试数据上的行号有关:意思是,我想申请它在测试数据的每一行。我还不能成功。
希望您能提前理解并感谢您!