Question

我正在使用R中的“学者”软件包。我想为我的研究组创建一个共同作者的社交网络。我创建了一个数据框架研究人员，如下所示：

members <- data.frame(name = c("Linton C Freeman", "Ronald Burt", "Stephen P. Borgatti"),
                      scholar_id = c("quiVMg8AAAAJ", "g-R8XdkAAAAJ", "hlk4a4gAAAAJ"),
                      stringsAsFactors = F)

然后，我创建了一个for循环以获取每个研究人员的出版物：

pubs <- get_publications(member$scholar_id[1])
for(i in 2:nrow(member)){
           pubs_ <- get_publications(member$scholar_id[i])
           pubs <- rbind(pubs, pubs_)
}

要获得合著者名单，我需要使用以下语法：

coauthors <- get_complete_authors(scholar_id, pubid)

例如：

co-authors <- get_complete_authors(members$scholar_id[1], pubs$pubid[1])

我想遍历成员以获取数据框中的所有合著者。我想我需要嵌套我的循环，首先遍历酒吧，然后遍历成员。我还需要在循环中添加暂停语句，以避免HTTP 503错误。我的问题是如何构造一个执行此操作的循环？归根结底，我想要一个具有pubid和authors的数据框。我知道如何从中创建边缘列表。请帮忙。

Answer 1

这是我将如何使用单个data.frame来解决所有问题的方法。我这样做是因为看起来Google学术搜索使用相同的ID来引用不同的出版物，这使生活变得有趣。

library(scholar)
library(tidyverse)

member <- data_frame(name = c("Linton C Freeman", "Ronald Burt", "Stephen P. Borgatti"),
                      scholar_id = c("quiVMg8AAAAJ", "g-R8XdkAAAAJ", "hlk4a4gAAAAJ"))

bib_data <- member %>% 
  #this lets mutate work on each row independently
  rowwise %>% 
  #produce a dataframe for each row
  mutate(pubs = list(get_publications(scholar_id))) %>% 
  #expand the dataframes
  unnest() %>% 
  #I've included this to keep the requests down for a demonstration
  filter(row_number() < 6) %>% 
  #as above
  rowwise %>% 
  #this now uses the scholar_id and pubid from each row to get the coauthor
  #information as a new column
  mutate(coauths = get_complete_authors(scholar_id, pubid))

这样，您可以完全避免for循环，并希望保持所有记录的组织清楚。

处理共同作者的信息则面临另一个挑战，因为看起来格式（尤其是缩写形式）不一致...

将Google Scholar ID和pubid结合到for循环中

1 个答案: