Question

我是数学家，所以对IT知识了解不多。而且我想知道是否可以更快地使用具有多个条件的语句，或者更多的if / else if语句，如下面的示例所示。考虑到我有一个非常大的数据表（具有数百万行），并且在函数中具有此 if语句，我将其应用于一列的每一行并将结果存储在新列中。而且我只想了解这两种方法之间是否有差异（更快/更慢/相同）。

    if (is.na(numerator) == TRUE){
        result = 0
    }  else if (numerator == 0){
        result = 0
    }  else if (is.na(denominator) == TRUE){
        result = max
    }  else if (denominator == 0){
        result = max
    }  else {
        result = numerator/denominator
    }

OR

    if (is.na(numerator) == TRUE || numerator == 0){
        result = 0
    }  else if (is.na(denominator) == TRUE || denominator == 0){
        result = max
    }  else {
        result = numerator/denominator
    }

Answer 1

嗨，

为了改善上面的代码形式，我建议您首先使用if语句，该语句最常出现。这将加快代码的速度，因为在大多数情况下，直到结束时才检查if else区域。我对此做了一个很小的测试：

df <- data.frame(check = sample(c(0,1),size = 10000, replace = T, prob = c(0.1,0.9)),
                 solution = rep(NA, 10000))

start_t <- Sys.time()
for (idx in seq_len(nrow(df))) {
  if(df[idx, "check"]==0) {
    df[idx, "solution"] <- "zero"
  } else if (df[idx, "check"]==1) {
    df[idx, "solution"] <- "one"
  }
}
print(Sys.time()-start_t)

此代码在我的系统Time difference of 0.7524531 secs上需要。您可以看到数据帧中会出现多于零的1。因此，我将切换检查语句，并在开始处设置“ if check == 1”。

df <- data.frame(check = sample(c(0,1),size = 10000, replace = T, prob = c(0.1,0.9)),
                 solution = rep(NA, 10000))

start_t <- Sys.time()
for (idx in seq_len(nrow(df))) {
  if(df[idx, "check"]==1) {
    df[idx, "solution"] <- "one"
  } else if (df[idx, "check"]==0) {
    df[idx, "solution"] <- "zero"
  }
}
print(Sys.time()-start_t)

此代码仅需Time difference of 0.6977119 secs。它快了8％，并且执行与上述示例相同的工作。希望您能理解我的观点，并祝您代码顺利。

Answer 2

让我们做一个简单的实验！

虚拟数据

data <- data.frame(numerator = sample(c(0:9, NA), 10000, replace = T),
                   denominator = sample(c(0:9, NA), 10000, replace = T))

由两个“ if”条件组成的两个功能

f1 <- function(x){
  num <- x[1] ; denom <- x[2]
  if (is.na(num)){
    result = 0
  }  else if (num == 0){
    result = 0
  }  else if (is.na(denom)){
    result = Inf
  }  else if (denom == 0){
    result = Inf
  }  else {
    result = num / denom
  }
  return(result)
}

f2 <- function(x){
  num <- x[1] ; denom <- x[2]
  if (is.na(num) || num == 0){
    result = 0
  }  else if (is.na(denom) || denom == 0){
    result = Inf
  }  else {
    result = num / denom
  }
  return(result)
}

基准分析

library(microbenchmark)
library(ggplot2)

res <- microbenchmark(
  type1 = {
    quotient1 <- apply(data, 1, f1)
  }, type2 = {
    quotient2 <- apply(data, 1, f2)
  }, times = 100
)

res
# Unit: milliseconds
#  expr      min       lq     mean   median       uq       max
# type1 21.91925 23.70445 27.16314 25.52339 26.90110 122.91710
# type2 22.00139 23.64297 26.11080 25.04576 26.46136  42.62506

autoplot(res)

结论

您可以尝试几次基准测试，然后发现两个if条件之间没有显着差异。

如果具有多个条件的语句更好，或者如果具有更多条件，则更好？

2 个答案:

虚拟数据

由两个“ if”条件组成的两个功能

基准分析

结论