Question

我必须使用有时返回NULL的函数的值填充向量（然后是data.table）。这个例子说明了出了什么问题：

library(data.table)
createNull1 <- function() {  # I do not have access to the inside of this function
  if ((i/2)%%1==0) { return("ABC") } else return(NULL)
}

createNull2 <- function() {  # I do not have access to the inside of this function
  if ( (i/2)%%1==0) { return(NULL) } else return("XYZ")
}

my.data.table <- NULL
system.time(for (i in 1:10000) {
my.vector <- c(createNull1(),
               createNull2(),
               createNull1(),
               createNull2(),
               createNull1())
my.data.table <- rbind(my.data.table, data.table(my.vector))
})

生成的my.data.table有2列，但应该有5.这就是原因：

i <- 1
my.vector <- c(NULL, "ABC", NULL. "ABC", NULL)  # But 2 resulting elements, not 5
i <- 2
my.vector <- c("XYZ", NULL, "XYZ", NULL, "XYZ") # But 3 resulting elements, not 5

我无法编辑这些函数，因此必须在for循环中找到解决方案。

我需要数百万次这样做，所以速度很重要。以下代码解决了问题...

my.data.table <- NULL
system.time(for (i in 1:10000) {
  my.vector <- c(if (is.null(createNull1())) NA else createNull1(),
                 if (is.null(createNull2())) NA else createNull2(),
                 if (is.null(createNull1())) NA else createNull1(),
                 if (is.null(createNull2())) NA else createNull2(),
                 if (is.null(createNull1())) NA else createNull1())
  my.data.table <- rbind(my.data.table, data.table(my.vector))
})

......但是慢了两倍。

如何更快地创建my.data.table？

Answer 1

尝试预分配并使用set：

dt = data.table(a = rep(NA_character_,10000),
                b = NA_character_, c = NA_character_,
                d = NA_character_, e = NA_character_)

for (i in 1:10000) {
  if (is.null(v1 <- createNull1()))
    v1 = NA_character_
  if (is.null(v2 <- createNull2()))
    v2 = NA_character_

  set(dt, i, 1L, v1)
  set(dt, i, 2L, v2)
  set(dt, i, 3L, v1)
  set(dt, i, 4L, v2)
  set(dt, i, 5L, v1)
}

尽管为NULL，如何保持矢量长度和位置

1 个答案: