如何检查R中的整列是否稀疏?

时间:2015-09-15 01:14:37

标签: r sparse-matrix

如何检查整列是否稀疏?我知道这样做的黑客方式将取代所有" 0"带NA的条目,然后用is.na检查:

    $daysList = join(", ", $days);


    $cmd = "c:\\windows\\system32\\schtasks.exe /CREATE /SC WEEKLY /D \"$daysList\" /TN \"Action Item Reminder\" /TR \"php.exe  C:\\wamp\\www\\aim\\module\\Application\\src\\Application\\Controller\\sendmail.php\" /ST 00:01 /f";

    pclose(popen("start /B ". $cmd, "r"));  


    //echo "c:\\windows\\system32\\schtasks.exe /CREATE /SC WEEKLY /D \"$daysList\" /TN \"Action Item Reminder\" /TR \"C:\\wamp\\bin\\php\\php5.5.12\\php.exe  C:\\wamp\\www\\aim\\module\\Application\\src\\Application\\Controller\\sendmail.php\" /ST 00:01 /f";

    //echo '/CREATE /SC WEEKLY /D "'.  $daysList .'" /TN "Action Item Reminder" /TR "C:\wamp\www\aim\module\Application\src\Application\Controller\sendmail.php" /ST 00:01 /f"'; die();

    if (isset ($activate))
    {
        $emailOptionTable->update('true', 'Activate Reminders');
        $cmd = "c:\\windows\\system32\\schtasks.exe /Change /TN \"Action Item Reminder\" /Enable";

        pclose(popen("start /B ". $cmd, "r"));  
    }
    else
    {
        $emailOptionTable->update('false', 'Activate Reminders');   
        $cmd = "c:\\windows\\system32\\schtasks.exe /Change /TN \"Action Item Reminder\" /Disable";

        pclose(popen("start /B ". $cmd, "r"));
    }

有没有更快的方法来执行此操作,我不必遍历整个矩阵并用NA替换所有空值?

1 个答案:

答案 0 :(得分:0)

总结评论

此处完全没有必要转换为NA。你可以直接检查

sapply(df, function(x) all(x == 0))

根据您的数据,您还有另外两种选择:

  • 对于非负数值数据,colSums(x) == 0
  • sapply(df, function(x) x[1] == 0 && length(unique(x)) == 1)

基准

规格:

  • MacBook Pro Retina,2013年末,2.8 GHz Intel Core i7,16 GB 1600 MHz DDR3
  • Mac OS 10.10.5
  • R版本3.2.2(2015-08-14),作为CRAN的捆绑应用程序安装。

代码:

library(microbenchmark)

ncol <- 1000L
nrow <- 10000L
dense_frac <- 1/3
n_dense <- dense_frac %/% ncol
x <- data.frame(matrix(0, nrow, ncol))
dense_cols <- sample(ncol, n_dense)

all_zero <- function(x) {
  all(x == 0)
}

first_zero_all_same <- function(x) {
  x[1] == 0 && length(unique(x)) == 1L
}

zero_to_na <- function(x) {
  x[x == 0] <- NA
  all(is.na(x))
}

bench <- function(x) microbenchmark(
  colsum.zero = colSums(x) == 0,
  raw.colsum.zero = .colSums(as.matrix(x), nrow, ncol) == 0,
  apply.all.zero = apply(x, 2, all_zero),
  sapply.all.zero = sapply(x, all_zero),
  apply.first.zero.all.same = apply(x, 2, first_zero_all_same),
  sapply.first.zero.all.same = sapply(x, first_zero_all_same),
  apply.convert.to.na = apply(x, 2, zero_to_na),
  sapply.convert.to.na = sapply(x, zero_to_na),
  times = 10,
  control = list(order = "block")
)

set.seed(43770)
gc()

## non-negative integers
x[dense_cols] <- replicate(n_dense, rpois(nrow))
nneg_int <- bench(x)

## non-negative decimals
x[dense_cols] <- replicate(n_dense, abs(rnorm(nrow)))
nneg_dec <- bench(x)

结果:

print(nneg_int)
# Unit: milliseconds
#                        expr   min    lq  mean median    uq   max neval    cld
#                 colsum.zero  46.1  46.9  54.1   52.4  63.4  65.8    10 a     
#             raw.colsum.zero  46.6  48.3  59.5   53.8  57.2 120.7    10 a     
#              apply.all.zero 247.8 301.3 301.3  306.5 309.9 316.0    10    d  
#             sapply.all.zero  39.9  43.3  45.3   45.5  46.5  51.0    10 a     
#   apply.first.zero.all.same 494.0 494.5 509.5  515.6 518.0 526.2    10      f
#  sapply.first.zero.all.same 236.5 244.4 250.9  250.0 256.9 261.9    10   c   
#         apply.convert.to.na 436.1 479.8 481.6  486.7 492.2 498.0    10     e 
#        sapply.convert.to.na 220.6 226.6 230.6  229.6 234.3 239.7    10  b    

print(nneg_dec)
# Unit: milliseconds
#                        expr   min    lq  mean median    uq   max neval   cld
#                 colsum.zero  45.0  47.4  58.3   52.2  60.6 108.2    10 a    
#             raw.colsum.zero  45.2  53.8  55.0   54.7  58.6  65.7    10 a    
#              apply.all.zero 297.9 304.0 318.8  314.7 323.3 367.5    10   c  
#             sapply.all.zero  40.0  44.0  46.5   44.4  46.8  59.1    10 a    
#   apply.first.zero.all.same 502.9 534.4 536.2  539.8 543.5 547.6    10     e
#  sapply.first.zero.all.same 240.0 243.5 250.9  249.7 258.1 264.0    10  b   
#         apply.convert.to.na 492.5 493.1 498.7  498.2 499.4 518.0    10    d 
#        sapply.convert.to.na 228.8 236.0 240.4  238.4 244.1 253.8    10  b  

在此示例数据中,看起来最佳选项实际上是使用sapply来检查all(x == 0),而colSums方法是次佳的。