我正在尝试深入研究静态代码分析软件包的内部结构,如codetools
和CodeDepends
,我的目标是了解如何检测写为package_name::function_name()
或{的函数调用{1}}。我本来希望只使用package_name:::function_name()
中的findGlobals()
,但这不是那么简单。
要分析的示例功能:
codetools
所需功能:
f <- function(n){
tmp <- digest::digest(n)
stats::rnorm(n)
}
尝试analyze_function(f)
## [1] "digest::digest" "stats::rnorm"
:
codetools
library(codetools)
f = function(n) stats::rnorm(n)
findGlobals(f, merge = FALSE)
## $functions
## [1] "::"
##
## $variables
## character(0)
更接近,但我不确定我是否总能使用输出来匹配函数到包。我正在寻找将CodeDepends
与rnorm()
和stats
连接到digest()
的自动规则。
digest
编辑为了公平对待library(CodeDepends)
getInputs(body(f)
## An object of class "ScriptNodeInfo"
## Slot "files":
## character(0)
##
## Slot "strings":
## character(0)
##
## Slot "libraries":
## [1] "digest" "stats"
##
## Slot "inputs":
## [1] "n"
##
## Slot "outputs":
## [1] "tmp"
##
## Slot "updates":
## character(0)
##
## Slot "functions":
## { :: digest rnorm
## NA NA NA NA
##
## Slot "removes":
## character(0)
##
## Slot "nsevalVars":
## character(0)
##
## Slot "sideEffects":
## character(0)
##
## Slot "code":
## {
## tmp <- digest::digest(n)
## stats::rnorm(n)
## }
,对于了解内部人员的人来说,有太多的可定制性和强大功能。目前,我只是试图围绕收藏家,处理程序,步行者等等。显然,可以修改标准CodeDepends
收集器以特别记录每个命名空间的调用。就目前而言,这是对类似事物的天真尝试。
::
答案 0 :(得分:3)
如果要从函数中提取命名空间函数,请尝试类似这样的
> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux buster/sid
locale:
[1] LC_CTYPE=C LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] purrr_0.2.2.2 bindrcpp_0.2 prophet_0.1.1
[4] Rcpp_0.12.12 multidplyr_0.0.0.9000 dplyr_0.7.2
[7] tidyr_0.6.3
loaded via a namespace (and not attached):
[1] bindr_0.1 magrittr_1.5 munsell_0.4.3
[4] lattice_0.20-35 colorspace_1.3-2 R6_2.2.2
[7] rlang_0.1.1 extraDistr_1.8.6 plyr_1.8.4
[10] tools_3.3.3 parallel_3.3.3 grid_3.3.3
[13] gtable_0.2.0 StanHeaders_2.16.0-1 lazyeval_0.2.0
[16] assertthat_0.2.0 tibble_1.3.3 rstan_2.16.2
[19] gridExtra_2.2.1 ggplot2_2.2.1 codetools_0.2-15
[22] inline_0.3.14 glue_1.1.1 stringi_1.1.5
[25] scales_0.4.1 stats4_3.3.3 pkgconfig_2.0.1
[28] zoo_1.8-0
我们可以用
进行测试find_ns_functions <- function(f, found=c()) {
if( is.function(f) ) {
# function, begin search on body
return(find_ns_functions(body(f), found))
} else if (is.call(f) && deparse(f[[1]]) %in% c("::", ":::")) {
found <- c(found, deparse(f))
} else if (is.recursive(f)) {
# compound object, iterate through sub-parts
v <- lapply(as.list(f), find_ns_functions, found)
found <- unique( c(found, unlist(v) ))
}
found
}
答案 1 :(得分:1)
好的,所以CodeDepends之前可以做到这一点,但比它应该更难。我刚刚向github提交了0.5-4版本,现在让它变得非常简单。基本上你只需修改默认的colonshandlers(“::”和/或“:::”),如下所示:
library(CodeDepends) # version >= 0.5-4
handler = function(e, collector, ..., iscall = FALSE) {
collector$library(asVarName(e[[2]]))
## :: or ::: name, remove if you don't want to count those as functions called
collector$call(asVarName(e[[1]]))
if(iscall)
collector$call(deparse((e))) #whole expr ie stats::norm
else
collector$vars(deparse((e)), input=TRUE) #whole expr ie stats::norm
}
getInputs(quote(stats::rnorm(x,y,z)), collector = inputCollector("::" = handler))
getInputs(quote(lapply( 1:10, stats::rnorm)), collector = inputCollector("::" = handler))
上面的第一个getInputs调用给出了结果:
An object of class "ScriptNodeInfo"
Slot "files":
character(0)
Slot "strings":
character(0)
Slot "libraries":
[1] "stats"
Slot "inputs":
[1] "x" "y" "z"
Slot "outputs":
character(0)
Slot "updates":
character(0)
Slot "functions":
:: stats::rnorm
NA NA
Slot "removes":
character(0)
Slot "nsevalVars":
character(0)
Slot "sideEffects":
character(0)
Slot "code":
stats::rnorm(x, y, z)
我相信,我希望如此。
这里要注意的一件事是我添加到冒号处理程序的iscall参数。默认处理程序和applyhandlerfactory现在具有特殊逻辑,因此当它们在被调用函数的情况下调用其中一个冒号处理程序时,它被设置为TRUE。
我还没有进行过广泛的测试,当“stats :: rnorm”出现代替符号时会发生什么,特别是在计算依赖关系时的输入槽中,但是我希望所有这些都能继续工作。如果它不让我知道。
〜ģ