我有一份清单清单;我们称之为mat
。我想将其转换为数据帧。
以下是一些示例内容。
[14]][[1000]]
[[14]][[1000]][[1]]
[1] 51
[[14]][[1000]][[2]]
[1] 10
[[14]][[1000]][[3]]
[1] "C Hou" "C Han"
[[14]][[1000]][[4]]
[1] "Communication Middleware and Software for QoS Control in Distributed Real-Time EnvironmentsSpecifically, we consider the following innovative research components "
[[14]][[1000]][[5]]
[1] "COMPSAC International Computer Software and Applications Conference"
它们是:纸质ID,作者ID,共同作者姓名,纸质标题和期刊标题。
这个大型列表是由14个文本文件生成的,我碰巧选择了最后一个打印到控制台的文件,因此"第一个"指数[[14]]; "第二" [[1000]]的索引是指文本文件中的第1000个条目或记录,[[1]]是"索引" "列名称" (纸质ID,作者ID,共同作者姓名,论文题目和期刊名称)。
现在,我已经尝试了一些事情,没有运气;当我尝试将其转换为数据帧时,我总是收到错误Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 1, 0
。
此外,当我使用代码x = mat[[1]]
时,想要提取一个列表列表,即第一个文本文件中的列表,我甚至无法查看"它。 View(x)
产生相同的错误:Error in View : arguments imply differing number of rows: 1, 0
。
我完全迷失了如何将这个大型列表转换为我可以使用的数据帧。感谢。
答案 0 :(得分:1)
您可以执行嵌套lapply来处理每个嵌套列表,如下所示
papers <- do.call(rbind, lapply(mat, function(txtfile) {
lapply(txtfile, function(entry) {
#to handle multiple coauthors and paste into a single string
l <- lapply(entry, function(eachcol) {
paste(eachcol)
})
df <- as.data.frame(l)
df
})
}))
names(papers) <- c("paper ID", "author ID", "coauthor names", "paper title", "journal title")
我没有数据来测试它,如果仍然失败,请给我一个喊叫。
一个相关的qn:你为什么不以data.frames而不是列表的形式阅读文本文件?
答案 1 :(得分:1)
我尝试重新创建一些与数据结构相匹配的示例数据(我希望我做对了):
## Create sample data:
createList <- function(j){
nElem <- 5
paperIDVec <- sample.int(1000, nElem, replace = FALSE)
authorIDVec <- sample.int(1000, nElem, replace = FALSE)
coauthorsList <- lapply(1:nElem, function(ii){
paste("Coauthor", 1:sample.int(3, 1))
})
paperTitleVec <- paste("Some brilliant idea that author", authorIDVec, "had")
journalVec <- vapply(1:nElem, function(ii) paste("Journal",
paste(LETTERS[sample.int(26, 3, replace = TRUE)], collapse = "")), character(1))
outList <- lapply(1:nElem, function(ii){
list(paperIDVec[ii], authorIDVec[ii],
coauthorsList[[ii]], paperTitleVec[ii],
journalVec[ii])
})
}
mat <- lapply(1:4, createList)
使用这些数据并按照@ chinsoon12的方法,我首先将条目粘贴在一起,为每个条目创建一个单独的字符(例如,三个共同作者c("Mr. X", "Mrs. J", "Mr. M")
的向量变为"Mr. X, Mrs. J, Mr. M"
),并且然后将数据转换为数据框并连续组合它们以创建一个大数据框:
## Turn nested list into one data frame:
textFileDfList <- lapply(mat, function(listLevel2) {
## Convert list on second level of hierarchy (= one text file)
## to a list of data frames (one for each entry)
dataFrameList <- lapply(listLevel2, function(listLevel3){
## Paste multiple entries (e.g. vector of co-authors)
## together to create a single character entry:
simplifiedList <- lapply(listLevel3,
function(entries) paste(entries, collapse = ", "))
## Create data.frame:
outDf <- as.data.frame(simplifiedList,
stringsAsFactors = FALSE,
col.names = c("paper ID", "author ID", "coauthor names",
"paper title", "journal title"))
})
## Combine data frames of the single entries to one data frame,
## containing all entries of the text file:
textFileDf <- do.call('rbind', dataFrameList)
})
## Combine data frames of the text files to one big data frame:
bigDataFrame <- do.call('rbind', textFileDfList)
> head(bigDataFrame)
paper.ID author.ID coauthor.names
1 862 990 Coauthor 1, Coauthor 2
2 688 400 Coauthor 1
3 921 963 Coauthor 1, Coauthor 2, Coauthor 3
4 479 455 Coauthor 1, Coauthor 2
5 709 340 Coauthor 1
6 936 591 Coauthor 1, Coauthor 2
paper.title journal.title
1 Some brilliant idea that author 990 had Journal PZR
2 Some brilliant idea that author 400 had Journal MQD
3 Some brilliant idea that author 963 had Journal WFW
4 Some brilliant idea that author 455 had Journal TZV
5 Some brilliant idea that author 340 had Journal DCR
6 Some brilliant idea that author 591 had Journal EGW