Question

出于教学目的，我试图比较两个简单函数的内存使用情况。但是，我遇到了一个问题，我想知道是Rprof（）或summayRprof中的错误。有没有人有任何想法我做错了什么或如何解决这个问题？

fun.rbind <- function(indata) {

  outdata <- NULL
  n <- nrow(indata)

  for (i in 1:n) {
     if (!any(is.na(indata[i,]))) outdata <- rbind(outdata, indata[i,])
  }
  outdata
}

和

  fun.omit <- function(indata) {

      drop <- FALSE
      n = ncol(indata)

      for (i in 1:n) drop <- drop | is.na(indata[, i])
      indata[!drop, ]
    }

第二个版本在执行时间和内存使用方面应该更有效率。

如果我执行以下操作，它就可以正常工作。

    > data.matrix <- matrix(rnorm(2000000), 100000, 20)
    > data.matrix[data.matrix > 2] <- NA
    > Rprof("fun.omit.out", memory.profiling = TRUE)
    > y <- fun.omit(data.matrix)
    > Rprof(NULL)
    > summaryRprof("fun.omit.out",  memory="both")
$by.self
           self.time self.pct total.time total.pct mem.total
"fun.omit"      0.04       50       0.08       100      38.5
"|"             0.04       50       0.04        50      22.5

$by.total
           total.time total.pct mem.total self.time self.pct
"fun.omit"       0.08       100      38.5      0.04       50
"|"              0.04        50      22.5      0.04       50

$sample.interval
[1] 0.02

$sampling.time
[1] 0.08

然而，对第一个函数的相同操作失败，并显示一个神秘的错误消息。

    > Rprof("fun.rbind.out", memory.profiling = TRUE)
    > y <- fun.rbind(data.matrix)
    > Rprof(NULL)
    > summaryRprof("fun.rbind.out",  memory = "both")
Error in rowsum.default(memcounts[rep.int(seq_along(memcounts), ulen)],  : 
  unimplemented type 'NULL' in 'HashTableSetup'
In addition: Warning message:
In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'

如果我不使用内存=＆＃39;两者都是＆＃39;选项，一切都按预期工作。

 > summaryRprof("fun.rbind.out")
$by.self
            self.time self.pct total.time total.pct
"rbind"        312.62    99.64     312.62     99.64
"fun.rbind"      0.84     0.27     313.74    100.00
"any"            0.12     0.04       0.12      0.04
"!"              0.08     0.03       0.08      0.03
"is.na"          0.08     0.03       0.08      0.03

$by.total
            total.time total.pct self.time self.pct
"fun.rbind"     313.74    100.00      0.84     0.27
"rbind"         312.62     99.64    312.62    99.64
"any"             0.12      0.04      0.12     0.04
"!"               0.08      0.03      0.08     0.03
"is.na"           0.08      0.03      0.08     0.03

$sample.interval
[1] 0.02

$sampling.time
[1] 313.74

Answer 1

如果有人有兴趣，我想我已经确定我的问题来源是summaryRprof（）中的错误。如果我使用memory =“stats”或memory =“timeseries”选项，我每次都会得到预期的结果。显然，memory =“both”选项并不总是有效。

例如，以下工作正如我所料。

> gctorture(on = TRUE)
> Rprof("fun.rbind.out", memory.profiling = TRUE)
> y <- fun.rbind(data.matrix)
> Rprof(NULL) 
> gctorture(on = FALSE)
> summaryRprof("fun.rbind.out", memory = "stats")
index: "fun.rbind"
     vsize.small  max.vsize.small      vsize.large  max.vsize.large 
             188           173856            19658          8962426 
           nodes        max.nodes     duplications tot.duplications 
           16630         15977360                1              860 
         samples 
            1009 
--------------------------------------------------------------- 
index: "fun.rbind":"!"
     vsize.small  max.vsize.small      vsize.large  max.vsize.large 
              17               17            16660            16660 
           nodes        max.nodes     duplications tot.duplications 
            1456             1456                1                1 
         samples 
               1 
--------------------------------------------------------------- 
index: "fun.rbind":"any"
     vsize.small  max.vsize.small      vsize.large  max.vsize.large 
              22               82             9383            67880 
           nodes        max.nodes     duplications tot.duplications 
             796             2688                1              896 
         samples 
             741 
--------------------------------------------------------------- 
index: "fun.rbind":"is.na"
     vsize.small  max.vsize.small      vsize.large  max.vsize.large 
              18               51             9561            34040 
           nodes        max.nodes     duplications tot.duplications 
             958             2520                1               51 
         samples 
              91 
--------------------------------------------------------------- 
index: "fun.rbind":"rbind"
     vsize.small  max.vsize.small      vsize.large  max.vsize.large 
               8               51             5966            68440 
           nodes        max.nodes     duplications tot.duplications 
             777             2800                1              948 
         samples 
            1215

虽然必须分两步总结执行时间和内存使用情况，这是一项额外的工作，但我可以忍受这一点。

在Rprof中设置memory.profiling = TRUE时使用summaryRprof时遇到错误

1 个答案: