我有一个嵌套的学术作者名单,例如:
> str(content)
List of 3
$ author-retrieval-response:List of 1
..$ :List of 6
.. ..$ @status : chr "found"
.. ..$ @_fa : chr "true"
.. ..$ coredata :List of 3
.. .. ..$ dc:identifier : chr "AUTHOR_ID:55604964500"
.. .. ..$ document-count: chr "6"
.. .. ..$ cited-by-count: chr "13"
.. ..$ h-index : chr "3"
.. ..$ coauthor-count: chr "7"
.. ..$ preferred-name:List of 2
.. .. ..$ surname : chr "García Cruz"
.. .. ..$ given-name: chr "Gustavo Adolfo"
$ author-retrieval-response:List of 1
..$ :List of 6
.. ..$ @status : chr "found"
.. ..$ @_fa : chr "true"
.. ..$ coredata :List of 3
.. .. ..$ dc:identifier : chr "AUTHOR_ID:56595713900"
.. .. ..$ document-count: chr "4"
.. .. ..$ cited-by-count: chr "21"
.. ..$ h-index : chr "3"
.. ..$ coauthor-count: chr "5"
.. ..$ preferred-name:List of 2
.. .. ..$ surname : chr "Akimov"
.. .. ..$ given-name: chr "Alexey"
$ author-retrieval-response:List of 1
..$ :List of 6
.. ..$ @status : chr "found"
.. ..$ @_fa : chr "true"
.. ..$ coredata :List of 3
.. .. ..$ dc:identifier : chr "AUTHOR_ID:12792624600"
.. .. ..$ document-count: chr "10"
.. .. ..$ cited-by-count: chr "117"
.. ..$ h-index : chr "6"
.. ..$ coauthor-count: chr "7"
.. ..$ preferred-name:List of 2
.. .. ..$ surname : chr "Alecke"
.. .. ..$ given-name: chr "Björn"
我有兴趣提取以下值:
dc:标识符,文档计数,按计数引用,h索引, 合著者人数,姓氏,名字
然后将其解析为类似数据框架的结构。
我有两个问题:第一个问题是我无法访问列表的不同级别。确实,尽管content[[3]]
返回了第三子列表/作者的元素,但我还没有找到访问第三作者的子列表的方法,即:
> content[[3]][[2]]
Error in content[[3]][[2]] : subscript out of bounds
我还想像一下,一旦我可以访问它,就不能简单地使用sapply
,因为我想从列表中解析的元素不在同一级别。
我将列表的前三个元素中的dput
粘贴进来:
structure(list(`author-retrieval-response` = list(structure(list(
`@status` = "found", `@_fa` = "true", coredata = structure(list(
`dc:identifier` = "AUTHOR_ID:55604964500", `document-count` = "6",
`cited-by-count` = "13"), .Names = c("dc:identifier",
"document-count", "cited-by-count")), `h-index` = "3", `coauthor-count` = "7",
`preferred-name` = structure(list(surname = "García Cruz",
`given-name` = "Gustavo Adolfo"), .Names = c("surname",
"given-name"))), .Names = c("@status", "@_fa", "coredata",
"h-index", "coauthor-count", "preferred-name"))), `author-retrieval-response` = list(
structure(list(`@status` = "found", `@_fa` = "true", coredata = structure(list(
`dc:identifier` = "AUTHOR_ID:56595713900", `document-count` = "4",
`cited-by-count` = "21"), .Names = c("dc:identifier",
"document-count", "cited-by-count")), `h-index` = "3", `coauthor-count` = "5",
`preferred-name` = structure(list(surname = "Akimov",
`given-name` = "Alexey"), .Names = c("surname", "given-name"
))), .Names = c("@status", "@_fa", "coredata", "h-index",
"coauthor-count", "preferred-name"))), `author-retrieval-response` = list(
structure(list(`@status` = "found", `@_fa` = "true", coredata = structure(list(
`dc:identifier` = "AUTHOR_ID:12792624600", `document-count` = "10",
`cited-by-count` = "117"), .Names = c("dc:identifier",
"document-count", "cited-by-count")), `h-index` = "6", `coauthor-count` = "7",
`preferred-name` = structure(list(surname = "Alecke",
`given-name` = "Björn"), .Names = c("surname", "given-name"
))), .Names = c("@status", "@_fa", "coredata", "h-index",
"coauthor-count", "preferred-name")))), .Names = c("author-retrieval-response",
"author-retrieval-response", "author-retrieval-response"))
非常感谢您的帮助!
答案 0 :(得分:2)
请考虑一个rapply
(递归应用函数)来展平lapply
中所有嵌套的子元素和孙元素,该元素跨越前三个父元素。然后将结果与t()
转置,并将其传递到data.frame()
构造函数调用中。
flat_list <- lapply(my_list, function(x) data.frame(t(rapply(x, function(x) x[1]))))
final_df <- do.call(rbind, unname(flat_list))
输出
final_df
# X.status X._fa coredata.dc.identifier coredata.document.count coredata.cited.by.count h.index coauthor.count preferred.name.surname preferred.name.given.name
# 1 found true AUTHOR_ID:55604964500 6 13 3 7 García Cruz Gustavo Adolfo
# 2 found true AUTHOR_ID:56595713900 4 21 3 5 Akimov Alexey
# 3 found true AUTHOR_ID:12792624600 10 117 6 7 Alecke Björn