使用静态代码分析检测`package_name :: function_name()`

时间:2017-07-22 18:38:07

标签: r code-analysis static-code-analysis

我正在尝试深入研究静态代码分析软件包的内部结构,如codetoolsCodeDepends,我的目标是了解如何检测写为package_name::function_name()或{的函数调用{1}}。我本来希望只使用package_name:::function_name()中的findGlobals(),但这不是那么简单。

要分析的示例功能:

codetools

所需功能:

f <- function(n){
  tmp <- digest::digest(n)
  stats::rnorm(n)
}

尝试analyze_function(f) ## [1] "digest::digest" "stats::rnorm"

codetools

library(codetools) f = function(n) stats::rnorm(n) findGlobals(f, merge = FALSE) ## $functions ## [1] "::" ## ## $variables ## character(0) 更接近,但我不确定我是否总能使用输出来匹配函数到包。我正在寻找将CodeDependsrnorm()stats连接到digest()的自动规则。

digest

编辑为了公平对待library(CodeDepends) getInputs(body(f) ## An object of class "ScriptNodeInfo" ## Slot "files": ## character(0) ## ## Slot "strings": ## character(0) ## ## Slot "libraries": ## [1] "digest" "stats" ## ## Slot "inputs": ## [1] "n" ## ## Slot "outputs": ## [1] "tmp" ## ## Slot "updates": ## character(0) ## ## Slot "functions": ## { :: digest rnorm ## NA NA NA NA ## ## Slot "removes": ## character(0) ## ## Slot "nsevalVars": ## character(0) ## ## Slot "sideEffects": ## character(0) ## ## Slot "code": ## { ## tmp <- digest::digest(n) ## stats::rnorm(n) ## } ,对于了解内部人员的人来说,有太多的可定制性和强大功能。目前,我只是试图围绕收藏家,处理程序,步行者等等。显然,可以修改标准CodeDepends收集器以特别记录每个命名空间的调用。就目前而言,这是对类似事物的天真尝试。

::

2 个答案:

答案 0 :(得分:3)

如果要从函数中提取命名空间函数,请尝试类似这样的

> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux buster/sid

locale:
 [1] LC_CTYPE=C                 LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] purrr_0.2.2.2         bindrcpp_0.2          prophet_0.1.1
[4] Rcpp_0.12.12          multidplyr_0.0.0.9000 dplyr_0.7.2
[7] tidyr_0.6.3

loaded via a namespace (and not attached):
 [1] bindr_0.1            magrittr_1.5         munsell_0.4.3
 [4] lattice_0.20-35      colorspace_1.3-2     R6_2.2.2
 [7] rlang_0.1.1          extraDistr_1.8.6     plyr_1.8.4
[10] tools_3.3.3          parallel_3.3.3       grid_3.3.3
[13] gtable_0.2.0         StanHeaders_2.16.0-1 lazyeval_0.2.0
[16] assertthat_0.2.0     tibble_1.3.3         rstan_2.16.2
[19] gridExtra_2.2.1      ggplot2_2.2.1        codetools_0.2-15
[22] inline_0.3.14        glue_1.1.1           stringi_1.1.5
[25] scales_0.4.1         stats4_3.3.3         pkgconfig_2.0.1
[28] zoo_1.8-0

我们可以用

进行测试
find_ns_functions <- function(f, found=c()) {
    if( is.function(f) ) {
        # function, begin search on body
        return(find_ns_functions(body(f), found))
    } else if (is.call(f) && deparse(f[[1]]) %in% c("::", ":::")) {
        found <- c(found, deparse(f))
    } else if (is.recursive(f)) {
        # compound object, iterate through sub-parts
        v <- lapply(as.list(f), find_ns_functions, found)
        found <- unique( c(found, unlist(v) ))        
    }
    found
}

答案 1 :(得分:1)

好的,所以CodeDepends之前可以做到这一点,但比它应该更难。我刚刚向github提交了0.5-4版本,现在让它变得非常简单。基本上你只需修改默认的colonshandlers(“::”和/或“:::”),如下所示:

library(CodeDepends) # version >= 0.5-4
handler = function(e, collector, ..., iscall = FALSE) {
    collector$library(asVarName(e[[2]]))
    ## :: or ::: name, remove if you don't want to count those as functions called
    collector$call(asVarName(e[[1]])) 
    if(iscall)
        collector$call(deparse((e))) #whole expr ie stats::norm
    else
        collector$vars(deparse((e)), input=TRUE) #whole expr ie stats::norm
}

getInputs(quote(stats::rnorm(x,y,z)), collector = inputCollector("::" = handler))
getInputs(quote(lapply( 1:10, stats::rnorm)), collector = inputCollector("::" = handler))

上面的第一个getInputs调用给出了结果:

An object of class "ScriptNodeInfo"
Slot "files":
character(0)

Slot "strings":
character(0)

Slot "libraries":
[1] "stats"

Slot "inputs":
[1] "x" "y" "z"

Slot "outputs":
character(0)

Slot "updates":
character(0)

Slot "functions":
          :: stats::rnorm 
          NA           NA 

Slot "removes":
character(0)

Slot "nsevalVars":
character(0)

Slot "sideEffects":
character(0)

Slot "code":
stats::rnorm(x, y, z)

我相信,我希望如此。

这里要注意的一件事是我添加到冒号处理程序的iscall参数。默认处理程序和applyhandlerfactory现在具有特殊逻辑,因此当它们在被调用函数的情况下调用其中一个冒号处理程序时,它被设置为TRUE。

我还没有进行过广泛的测试,当“stats :: rnorm”出现代替符号时会发生什么,特别是在计算依赖关系时的输入槽中,但是我希望所有这些都能继续工作。如果它不让我知道。

〜ģ