Question

我正在尝试将data.frame中的子集转换为data.table，以便提高代码的性能。但我对data.table完全不熟悉。这个子集化语句的data.table类型中的等价物是什么？

for(ii in 1:nplayer)
   {
   subgame<-subset(game, game$playerA == player[ii] | game$playerB == player[ii])
   players[ii,4]<-nrow(subgame)
   }

我已经以这种方式定义了新的data.table gameDT

  gameDT<-data.table(game)
  setkey(gameDT,playerA,playerB)

输出

  >dput(game[1:2,])
   structure(list(country = c("New Zealand", "Australia"), tournament = c("WTA Auckland 2012", 
   "WTA Brisbane 2012"), date = c("2011-12-31 00:00:00", "2011-12-30 00:15:00"
   ), playerA = c("Schoofs B.", "Lucic M."), playerB = c("Puig M.", 
   "Tsurenko L."), resultA = c(1L, 1L), resultB = c(2L, 2L), oddA = c("1.8", 
   "2.17"), oddB = c("1.9", "1.57"), N = c(4L, 3L), Weight = c(1, 
   0.973608997871031)), .Names = c("country", "tournament", "date", 
   "playerA", "playerB", "resultA", "resultB", "oddA", "oddB", "N", 
   "Weight"), row.names = 1:2, class = "data.frame")

Answer 1

如果这不仅仅是学习lapply

的练习，您可以考虑使用data.table

我认为下面的示例与您尝试的相似，您可以使用lapply看到相当不错的加速：

set.seed(123)
library(microbenchmark)

game = data.frame(runif(1:50) , playerA = sample(letters[1:5], 50, replace = T), playerB = sample(letters[1:5], 50, replace = T))

player <- union(game$playerA, game$playerB)
nplayer <-  length(player)
players <- matrix(player, nrow = nplayer, ncol = 2) 

op  <- microbenchmark(
  LAPPLY = {counts <- lapply(1:nplayer, 
                             function(i) sum(game$playerA == player[i] | game$playerB == player[i]))
            names(counts) <- player }, 
  ORIG = {
      for(ii in 1:nplayer)
        {
          subgame<-subset(game, game$playerA == player[ii] | game$playerB == player[ii])
          players[ii,2]<-nrow(subgame)
        }},
  times = 1000)

op

#Unit: microseconds
#   expr     min       lq   median        uq       max neval
# LAPPLY 236.493 251.9985  259.095  269.3205  8323.701  1000
#   ORIG 938.194 981.9060 1002.880 1036.6705 61095.935  1000

unlist(counts)

# a  c  d  b  e 
#19 17 20 20 15 

players

#     [,1] [,2]
#[1,] "a"  "19"
#[2,] "c"  "17"
#[3,] "d"  "20"
#[4,] "b"  "20"
#[5,] "e"  "15"

将union类型子集从data.frame转换为data.table

1 个答案: