Question

tl; dr：rentrez生成的摘要列表有什么不同？为什么说这些列表在使用rentrez合并后停止与其他append()函数一起工作？

我正在使用rentrez访问Pubmed。我可以毫无问题地搜索出版物并下载摘要。但是，对于我不了解的摘要列表必须有一些特殊之处，因为当我使用append()尝试合并列表时，事情会分崩离析。通过阅读文档，我无法弄清有什么区别。这是使我能够毫无问题地搜索Pubmed并下载记录的代码：

# set search term and retmax
term_set <- '"Transcription, Genetic"[Mesh] AND "Regulatory Sequences, Nucleic Acid"[Mesh] AND 2017:2018[PDAT]'
retmax_set <- 500
# search pubmed using web history
search.l <- entrez_search(db = "pubmed", term = term_set, use_history = T)
# get summaries of search hits using web history 
for (seq_start in seq(0, search.l$count, retmax_set)) {
    if (seq_start == 0) {summary.l <- list()} 
    summary.l[[length(summary.l)+1]] <- entrez_summary(
        db = "pubmed", 
        web_history = search.l$web_history, 
        retmax = retmax_set, 
        retstart = seq_start
    )
}

但是，先使用summary.l <- list()然后使用summary.l[[length(summary.l)+1]] <- entrez_summary(...会得到一个摘要列表的列表（在此搜索中为3个子列表）。这样会在数据提取的后续步骤（如下）中导致多个for循环，并且是一个毫无疑问的数据结构。

# extract desired information from esummary, convert to dataframe
for (i in 1:length(summary.l)) {
    if (i == 1) {faut.laut.l <- list()}
    faut.laut <- summary.l[[i]] %>% 
        extract_from_esummary(
            c("uid", "sortfirstauthor", "lastauthor"), 
            simplify = F
        )
    faut.laut.l <- c(faut.laut.l, faut.laut)
}
faut.laut.df <- rbindlist(faut.laut.l)

在下面的代码中使用append()给出了所有1334个摘要的单个列表，避免了子列表。

# get summaries of search hits using web history 
for (seq_start in seq(0, search.l$count, retmax_set)) {
    if (seq_start == 0) {
        summary.append.l <- entrez_summary(
            db = "pubmed", 
            web_history = search.l$web_history, 
            retmax = retmax_set, 
            retstart = seq_start
        )
    } 
    summary.append.l <- append(
        summary.append.l,
        entrez_summary(
            db = "pubmed", 
            web_history = search.l$web_history, 
            retmax = retmax_set, 
            retstart = seq_start
        )
    )
}

但是，在后续步骤extract_from_esummary()中会引发错误，即使文档中指出参数esummaries应该是一个摘要对象列表。

# extract desired information from esummary, convert to dataframe
faut.laut.append.l <- extract_from_esummary(
    esummaries = summary.append.l,
    elements = c("uid", "sortfirstauthor", "lastauthor"), 
    simplify = F
)
Error in UseMethod("extract_from_esummary", esummaries) : 
no applicable method for 'extract_from_esummary' applied to an object of class "list"

faut.laut.append.df <- rbindlist(faut.laut.append.l)
Error in rbindlist(faut.laut.append.l) : 
object 'faut.laut.append.l' not found

一次调用entrez_summary()即可完成少于500条记录的搜索，并且不需要列表的串联。因此，下面的代码有效。

# set search term and retmax
term_set_small <- 'kadonaga[AUTH]'
retmax_set <- 500
# search pubmed using web history
search_small <- entrez_search(db = "pubmed", term = term_set_small, use_history = T)
# get summaries from search with <500 hits
summary_small <- entrez_summary(
    db = "pubmed", 
    web_history = search_small$web_history, 
    retmax = retmax_set
)
# extract desired information from esummary, convert to dataframe
faut.laut_small <- extract_from_esummary(
    esummaries = summary_small,
    elements = c("uid", "sortfirstauthor", "lastauthor"), 
    simplify = F
)
faut.laut_small.df <- rbindlist(faut.laut_small)

为什么append()破坏了摘要，可以避免吗？谢谢。

Answer 1

extract_from_esummary的文档对此有些困惑。它真正需要的是esummary对象或esummary_list。因为esummary对象本身是从列表继承的，所以我认为我们无法轻易地使extract_from_esummary在抛出该列表的任何列表上工作。我会修复文档，也许会考虑为对象设计更好的设计。

要解决此特定问题，有一些修复。一，您可以重新分类摘要列表

class(summary.append.l) <- c("list", "esummary_list")
extract_from_esummary(summary.append.l, "sortfirstauthor")

应该做到这一点。另一种选择是在执行任何附加操作之前提取相关数据。这与您的示例类似，lapply少了for

all_the_summs <- lapply(seq(0,50,5),  function(s) {
    entrez_summary(db="pubmed", 
                   web_history=search.l$web_history, 
                   retmax=5,  retstart=s)
})
desired_fields <- lapply(all_the_summs, extract_from_esummary, c("uid", "sortfirstauthor", "lastauthor"), simplify=FALSE)  
res <- do.call(cbind.data.frame, desired_fields)

希望能提供前进的道路。

rentrez的摘要列表在使用append（）合并后停止工作

1 个答案: