从嵌套列表的不同级别提取元素

时间:2018-08-31 15:29:41

标签: r list nested

我有一个嵌套的学术作者名单,例如:

> str(content)
List of 3
 $ author-retrieval-response:List of 1
  ..$ :List of 6
  .. ..$ @status       : chr "found"
  .. ..$ @_fa          : chr "true"
  .. ..$ coredata      :List of 3
  .. .. ..$ dc:identifier : chr "AUTHOR_ID:55604964500"
  .. .. ..$ document-count: chr "6"
  .. .. ..$ cited-by-count: chr "13"
  .. ..$ h-index       : chr "3"
  .. ..$ coauthor-count: chr "7"
  .. ..$ preferred-name:List of 2
  .. .. ..$ surname   : chr "García Cruz"
  .. .. ..$ given-name: chr "Gustavo Adolfo"
 $ author-retrieval-response:List of 1
  ..$ :List of 6
  .. ..$ @status       : chr "found"
  .. ..$ @_fa          : chr "true"
  .. ..$ coredata      :List of 3
  .. .. ..$ dc:identifier : chr "AUTHOR_ID:56595713900"
  .. .. ..$ document-count: chr "4"
  .. .. ..$ cited-by-count: chr "21"
  .. ..$ h-index       : chr "3"
  .. ..$ coauthor-count: chr "5"
  .. ..$ preferred-name:List of 2
  .. .. ..$ surname   : chr "Akimov"
  .. .. ..$ given-name: chr "Alexey"
 $ author-retrieval-response:List of 1
  ..$ :List of 6
  .. ..$ @status       : chr "found"
  .. ..$ @_fa          : chr "true"
  .. ..$ coredata      :List of 3
  .. .. ..$ dc:identifier : chr "AUTHOR_ID:12792624600"
  .. .. ..$ document-count: chr "10"
  .. .. ..$ cited-by-count: chr "117"
  .. ..$ h-index       : chr "6"
  .. ..$ coauthor-count: chr "7"
  .. ..$ preferred-name:List of 2
  .. .. ..$ surname   : chr "Alecke"
  .. .. ..$ given-name: chr "Björn"

我有兴趣提取以下值:

  

dc:标识符,文档计数,按计数引用,h索引,   合著者人数,姓氏,名字

然后将其解析为类似数据框架的结构。

我有两个问题:第一个问题是我无法访问列表的不同级别。确实,尽管content[[3]]返回了第三子列表/作者的元素,但我还没有找到访问第三作者的子列表的方法,即:

> content[[3]][[2]]
Error in content[[3]][[2]] : subscript out of bounds

我还想像一下,一旦我可以访问它,就不能简单地使用sapply,因为我想从列表中解析的元素不在同一级别。

我将列表的前三个元素中的dput粘贴进来:

structure(list(`author-retrieval-response` = list(structure(list(
    `@status` = "found", `@_fa` = "true", coredata = structure(list(
        `dc:identifier` = "AUTHOR_ID:55604964500", `document-count` = "6", 
        `cited-by-count` = "13"), .Names = c("dc:identifier", 
    "document-count", "cited-by-count")), `h-index` = "3", `coauthor-count` = "7", 
    `preferred-name` = structure(list(surname = "García Cruz", 
        `given-name` = "Gustavo Adolfo"), .Names = c("surname", 
    "given-name"))), .Names = c("@status", "@_fa", "coredata", 
"h-index", "coauthor-count", "preferred-name"))), `author-retrieval-response` = list(
    structure(list(`@status` = "found", `@_fa` = "true", coredata = structure(list(
        `dc:identifier` = "AUTHOR_ID:56595713900", `document-count` = "4", 
        `cited-by-count` = "21"), .Names = c("dc:identifier", 
    "document-count", "cited-by-count")), `h-index` = "3", `coauthor-count` = "5", 
        `preferred-name` = structure(list(surname = "Akimov", 
            `given-name` = "Alexey"), .Names = c("surname", "given-name"
        ))), .Names = c("@status", "@_fa", "coredata", "h-index", 
    "coauthor-count", "preferred-name"))), `author-retrieval-response` = list(
    structure(list(`@status` = "found", `@_fa` = "true", coredata = structure(list(
        `dc:identifier` = "AUTHOR_ID:12792624600", `document-count` = "10", 
        `cited-by-count` = "117"), .Names = c("dc:identifier", 
    "document-count", "cited-by-count")), `h-index` = "6", `coauthor-count` = "7", 
        `preferred-name` = structure(list(surname = "Alecke", 
            `given-name` = "Björn"), .Names = c("surname", "given-name"
        ))), .Names = c("@status", "@_fa", "coredata", "h-index", 
    "coauthor-count", "preferred-name")))), .Names = c("author-retrieval-response", 
"author-retrieval-response", "author-retrieval-response"))

非常感谢您的帮助!

1 个答案:

答案 0 :(得分:2)

请考虑一个rapply(递归应用函数)来展平lapply中所有嵌套的子元素和孙元素,该元素跨越前三个父元素。然后将结果与t()转置,并将其传递到data.frame()构造函数调用中。

flat_list <- lapply(my_list, function(x) data.frame(t(rapply(x, function(x) x[1]))))

final_df <- do.call(rbind, unname(flat_list))

输出

final_df

#   X.status X._fa coredata.dc.identifier coredata.document.count coredata.cited.by.count h.index coauthor.count preferred.name.surname preferred.name.given.name
# 1    found  true  AUTHOR_ID:55604964500                       6                      13       3              7            García Cruz            Gustavo Adolfo
# 2    found  true  AUTHOR_ID:56595713900                       4                      21       3              5                 Akimov                    Alexey
# 3    found  true  AUTHOR_ID:12792624600                      10                     117       6              7                 Alecke                     Björn