在data.table上使用自定义函数

时间:2017-08-10 06:44:51

标签: r data.table knn

我想问一下是否可以将此功能应用于 data.table 方法:

myfunction <- function(i) {

  a <- test.dt[i, 1:21, with = F]

  final <- t((t(b) == a) * value)
  final[is.na(final)] <- 0
  sum.value <- rowSums(final)

  final1 <- cbind(train.dt, sum.value)
  final1 <- final1[order(-sum.value),]
  final1 <- final1[final1$sum.value > 0,]

  suggestion <- unique(final1[, 22, with = F])
  suggestion <- suggestion[1:5, ]

  return(suggestion)
}

这是我在字符列上使用的自定义kNN函数。它提供了前5个建议/预测。但是,如果在大型测试数据上执行它,我的性能问题就会出现(到目前为止我无法自行调整)。

使用的变量如下:

train.dt -- the training data, includes 22 columns (21 features, 1 label column)
test.dt -- the test data, same structure as training data
value -- a vector that contains the weights/importance value of 21 features
sum.value -- sum of all the weights on value vector (sum(value))
b -- has the same data as the training data, but excluding the label column
a -- has the same data as the test data, but excluding the label column
suggestion -- the output

另外,我想在此函数上使用lapply(或任何适当的apply系列),函数中的i variable与测试数据上的行号有关:意思是,我想申请它在测试数据的每一行。我还不能成功。

希望您能提前理解并感谢您!

0 个答案:

没有答案