从R's Rd文件中访问元素?

时间:2013-07-28 14:13:19

标签: r parsing rd

我希望浏览一个软件包,了解每个函数帮助文件中提到的作者是谁。

我找了一个从R的帮助文件中提取元素的函数,可以找到一个。我能找到的最近的是this post,来自Noam Ross。

这样的功能存在吗? (如果没有,我想我会破解Noam的代码以解析Rd文件,并找到我感兴趣的特定元素。)

谢谢,Tal。

潜在的代码示例:

get_field_from_r_help(topic="lm", field = "Description") #
# output:
  

'lm'用于拟合线性模型。它可以用来执行            回归,单层方差分析及其分析            协方差(虽然'aov'可以提供更方便的界面            对于这些)。

2 个答案:

答案 0 :(得分:5)

Duncan Murdoch解析Rd文件的{p> This document将会很有帮助,this SO post也是如此。

从中可以尝试以下内容:

getauthors <- function(package){
    db <- tools::Rd_db(package)
    authors <- lapply(db,function(x) {
        tags <- tools:::RdTags(x)
        if("\\author" %in% tags){
            # return a crazy list of results
            #out <- x[which(tmp=="\\author")]
            # return something a little cleaner
            out <- paste(unlist(x[which(tags=="\\author")]),collapse="")
        }
        else
            out <- NULL
        invisible(out)
        })
    gsub("\n","",unlist(authors)) # further cleanup
}

然后我们可以在一两个包上运行它:

> getauthors("knitr")
                                                                                     d:/RCompile/CRANpkg/local/3.0/knitr/man/eclipse_theme.Rd 
                                                                                                                     "  Ramnath Vaidyanathan" 
                                                                                         d:/RCompile/CRANpkg/local/3.0/knitr/man/image_uri.Rd 
                                                                                                                    "  Wush Wu and Yihui Xie" 
                                                                                      d:/RCompile/CRANpkg/local/3.0/knitr/man/imgur_upload.Rd 
                                                                              "  Yihui Xie, adapted from the imguR package by Aaron  Statham" 
                                                                                          d:/RCompile/CRANpkg/local/3.0/knitr/man/knit2pdf.Rd 
                                                                                         "  Ramnath Vaidyanathan, Alex Zvoleff and Yihui Xie" 
                                                                                           d:/RCompile/CRANpkg/local/3.0/knitr/man/knit2wp.Rd 
                                                                                                          "  William K. Morris and Yihui Xie" 
                                                                                        d:/RCompile/CRANpkg/local/3.0/knitr/man/knit_theme.Rd 
                                                                                                       "  Ramnath Vaidyanathan and Yihui Xie" 
                                                                                     d:/RCompile/CRANpkg/local/3.0/knitr/man/knitr-package.Rd 
                                                                                                            "  Yihui Xie <http://yihui.name>" 
                                                                                        d:/RCompile/CRANpkg/local/3.0/knitr/man/read_chunk.Rd 
                      "  Yihui Xie; the idea of the second approach came from  Peter Ruckdeschel (author of the SweaveListingUtils  package)" 
                                                                                       d:/RCompile/CRANpkg/local/3.0/knitr/man/read_rforge.Rd 
                                                                                                          "  Yihui Xie and Peter Ruckdeschel" 
                                                                                           d:/RCompile/CRANpkg/local/3.0/knitr/man/rst2pdf.Rd 
                                                                                                               "  Alex Zvoleff and Yihui Xie" 
                                                                                              d:/RCompile/CRANpkg/local/3.0/knitr/man/spin.Rd 
"  Yihui Xie, with the original idea from Richard FitzJohn  (who named it as sowsear() which meant to make a  silk purse out of a sow's ear)" 

也许工具

> getauthors("tools")
                       D:/murdoch/recent/R64-3.0/src/library/tools/man/bibstyle.Rd 
                                                                "  Duncan Murdoch" 
                   D:/murdoch/recent/R64-3.0/src/library/tools/man/checkPoFiles.Rd 
                                                                "  Duncan Murdoch" 
                        D:/murdoch/recent/R64-3.0/src/library/tools/man/checkRd.Rd 
                                                  "  Duncan Murdoch, Brian Ripley" 
                     D:/murdoch/recent/R64-3.0/src/library/tools/man/getDepList.Rd 
                                                                   " Jeff Gentry " 
                      D:/murdoch/recent/R64-3.0/src/library/tools/man/HTMLlinks.Rd 
                                                    "Duncan Murdoch, Brian Ripley" 
            D:/murdoch/recent/R64-3.0/src/library/tools/man/installFoundDepends.Rd 
                                                                     "Jeff Gentry" 
                D:/murdoch/recent/R64-3.0/src/library/tools/man/makeLazyLoading.Rd 
                                                   "Luke Tierney and Brian Ripley" 
                       D:/murdoch/recent/R64-3.0/src/library/tools/man/parse_Rd.Rd 
                                                                " Duncan Murdoch " 
                     D:/murdoch/recent/R64-3.0/src/library/tools/man/parseLatex.Rd 
                                                                  "Duncan Murdoch" 
                        D:/murdoch/recent/R64-3.0/src/library/tools/man/Rd2HTML.Rd 
                                                  "  Duncan Murdoch, Brian Ripley" 
                 D:/murdoch/recent/R64-3.0/src/library/tools/man/Rd2txt_options.Rd 
                                                                  "Duncan Murdoch" 
                   D:/murdoch/recent/R64-3.0/src/library/tools/man/RdTextFilter.Rd 
                                                                "  Duncan Murdoch" 
                D:/murdoch/recent/R64-3.0/src/library/tools/man/SweaveTeXFilter.Rd 
                                                                  "Duncan Murdoch" 
                       D:/murdoch/recent/R64-3.0/src/library/tools/man/texi2dvi.Rd 
                     "  Originally Achim Zeileis but largely rewritten by R-core." 
                  D:/murdoch/recent/R64-3.0/src/library/tools/man/tools-package.Rd 
"  Kurt Hornik and Friedrich Leisch  Maintainer: R Core Team R-core@r-project.org" 
                D:/murdoch/recent/R64-3.0/src/library/tools/man/vignetteDepends.Rd 
                                                                   " Jeff Gentry " 
                 D:/murdoch/recent/R64-3.0/src/library/tools/man/vignetteEngine.Rd 
                                            "Duncan Murdoch and Henrik Bengtsson." 
                  D:/murdoch/recent/R64-3.0/src/library/tools/man/writePACKAGES.Rd 
                                                        "  Uwe Ligges and R-core."

某些函数没有作者字段,因此只会在unlist结束时调用getauthors时删除它们,但可以稍微修改代码以返回NULL

此外,进一步解析会变得有点困难,因为包作者似乎以非常不同的方式使用此字段。 devtools 中只有一个作者字段。 car 中有一堆,每个都包含一个电子邮件地址。等等,但这会让你获得可用的信息,你应该可以进一步使用。

注意:如果您拥有Rd文件的完整路径,我的此答案的先前版本提供了一个解决方案,但如果您尝试对已安装的软件包执行此操作则不起作用。按照泰勒的建议,我已经找到了一个更完整的解决方案。

答案 1 :(得分:1)

这是我使用其他人提出的建议的方法:

package <- "qdap"
funs <- unclass(lsf.str(envir = asNamespace(package)))

out <- sapply(funs, function(x) {
    x <- try(capture.output(tools:::Rd2txt(utils:::.getHelpFile(as.character(help(x, help_type="text"))))))
    Auth_lines <- grep("_\bA_\bu_\bt_\bh_\bo_\br(_\bs):", x, fixed = TRUE) 
    if (identical(Auth_lines, integer(0))) {
        return(NA)
    }
    gsub("^\\s+|\\s+$", "", x[Auth_lines +2])
})

## To look at just the ones with author fields:
out[!sapply(out, is.na)]

## > out[!sapply(out, is.na)]
##                                                         beg2char 
##                   "Josh O'Brien, Justin Haynes and Tyler Rinker" 
##                                                         bracketX 
##       "Martin Morgan and Tyler Rinker <tyler.rinker@gmail.com>." 
##                                                    bracketXtract 
##       "Martin Morgan and Tyler Rinker <tyler.rinker@gmail.com>." 
##                                                         char2end 
##                   "Josh O'Brien, Justin Haynes and Tyler Rinker" 
##                                                 cm_df.transcript 
## "DWin, Gavin Simpson and Tyler Rinker <tyler.rinker@gmail.com>." 
##                                                            gantt 
##           "DigEmAll (<URL: stackoverflow.com>) and Tyler Rinker" 
##                                                       gantt_wrap 
##     "Andrie de Vries and Tyler Rinker <tyler.rinker@gmail.com>." 
##                                                             genX 
##       "Martin Morgan and Tyler Rinker <tyler.rinker@gmail.com>." 
##                                                        genXtract 
##       "Martin Morgan and Tyler Rinker <tyler.rinker@gmail.com>." 
##                                                             hash 
##      "Bryan Goodrich and Tyler Rinker <tyler.rinker@gmail.com>." 
##                                                         name2sex 
##    "Dason Kurkiewicz and Tyler Rinker <tyler.rinker@gmail.com>." 
##                                                  read.transcript 
##      "Bryan Goodrich and Tyler Rinker <tyler.rinker@gmail.com>." 
##                                                      sentCombine 
##    "Dason Kurkiewicz and Tyler Rinker <tyler.rinker@gmail.com>." 
##                                                        sentSplit 
##    "Dason Kurkiewicz and Tyler Rinker <tyler.rinker@gmail.com>." 
##                                                              TOT 
##    "Dason Kurkiewicz and Tyler Rinker <tyler.rinker@gmail.com>." 
##                                                          v.outer 
##   "Vincent Zoonekynd and Tyler Rinker <tyler.rinker@gmail.com>."