Question

我有一个更大的data.frame，我想切成小的，取决于一些＆＃34; unique_keys＆＃34; （在MySQL中）。目前我正在使用这个循环执行此操作，但是对于10k行，它需要很长时间~45秒。

for( i in 1:nrow(identifiers_test) ) {
  data_test_offer = data_test[(identifiers_test[i,"m_id"]==data_test[,"m_id"] &
                     identifiers_test[i,"a_id"]==data_test[,"a_id"] &
                     identifiers_test[i,"condition"]==data_test[,"condition"] &
                     identifiers_test[i,"time_of_change"]==data_test[,"time_of_change"]),]

  # Sort data by highest prediction
  data_test_offer = data_test_offer[order(-data_test_offer[,"prediction"]),]

  if(data_test_offer[1,"is_v"]==1){
    true_counter <- true_counter+1
  }
}

我如何重构这一点，使其更多＆＃34; R＆＃34; - 更快？

Answer 1

在应用组之前，您使用其他data.frame过滤data.frame。我会使用merge然后使用by。

ID <- c("m_id","a_id","condition","time_of_change")
filter_data <- merge(data_test,identifiers_test,by=ID)
by(filter_data,   do.call(paste,filter_data[,ID]),
      FUN=function(x)x[order(-x[,"prediction"]),])

当然，可以使用data.table更有效地编写相同的内容：

library(data.table)
setkeyv(setDT(identifiers_test),ID)
setkeyv(setDT(data_test),ID)
data_test[identifiers_test][rev(order(prediction)),,ID]

注意：由于您未提供测试数据，因此未对以下答案进行测试。

如何有效地将data.frame分成小块并处理它们

1 个答案: