哦,伙计。我很难从我的代码中删除for循环,因为我发现它们非常直观,我首先学习了C ++。下面,我正在获取搜索的ID(在这种情况下为copd),并使用该ID检索其完整的XML文件,并从中将其位置保存到向量中。我不知道如何加快速度,大约需要5分钟才能运行700个ID,而大多数搜索都有70,000多个ID。感谢您提供任何指导。
library(rentrez)
library(XML)
# number of articles for term copd
count <- entrez_search(db = "pubmed", term = "copd")$count
# set max to count
id <- entrez_search(db = "pubmed", term = "copd", retmax = count)$ids
# empty vector that will soon contain locations
location <- character()
# get all location data
for (i in 1:count)
{
# get ID of each search
test <- entrez_fetch(db = "pubmed", id = id[i], rettype = "XML")
# convert to XML
test_list <- XML::xmlToList(test)
# retrieve location
location <- c(location, test_list$PubmedArticle$MedlineCitation$Article$AuthorList$Author$AffiliationInfo$Affiliation)
}
答案 0 :(得分:2)
这可能会给你一个开始 - 似乎可以一次拉下多个。
library(rentrez)
library(xml2)
# number of articles for term copd
count <- entrez_search(db = "pubmed", term = "copd")$count
# set max to count
id_search <- entrez_search(db = "pubmed", term = "copd", retmax = count, use_history = T)
# get all
document <- entrez_fetch(db = "pubmed", rettype = "XML", web_history = id_search$web_history)
document_list <- as_list(read_xml(document))
问题是这仍然很耗时,因为有大量文件。它还很好奇,当我尝试过这篇文章时,它会返回10,000篇文章 - 你可以立即回复的内容可能会受到限制。
然后,您可以使用purrr
包之类的内容开始提取所需的信息。