数据集

时间:2017-04-06 01:00:17

标签: r list dataframe

我有一份清单清单;我们称之为mat。我想将其转换为数据帧。

以下是一些示例内容。

[14]][[1000]]
[[14]][[1000]][[1]]
[1] 51

[[14]][[1000]][[2]]
[1] 10

[[14]][[1000]][[3]]
[1] "C Hou" "C Han"

[[14]][[1000]][[4]]
[1] "Communication Middleware and Software for QoS Control in Distributed Real-Time EnvironmentsSpecifically, we consider the following innovative research components "

[[14]][[1000]][[5]]
[1] "COMPSAC International Computer Software and Applications Conference"

它们是:纸质ID,作者ID,共同作者姓名,纸质标题和期刊标题。

这个大型列表是由14个文本文件生成的,我碰巧选择了最后一个打印到控制台的文件,因此"第一个"指数[[14]]; "第二" [[1000]]的索引是指文本文件中的第1000个条目或记录,[[1]]是"索引" "列名称" (纸质ID,作者ID,共同作者姓名,论文题目和期刊名称)。

现在,我已经尝试了一些事情,没有运气;当我尝试将其转换为数据帧时,我总是收到错误Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1, 0

此外,当我使用代码x = mat[[1]]时,想要提取一个列表列表,即第一个文本文件中的列表,我甚至无法查看"它。 View(x)产生相同的错误:Error in View : arguments imply differing number of rows: 1, 0

我完全迷失了如何将这个大型列表转换为我可以使用的数据帧。感谢。

2 个答案:

答案 0 :(得分:1)

您可以执行嵌套lapply来处理每个嵌套列表,如下所示

papers <- do.call(rbind, lapply(mat, function(txtfile) {
    lapply(txtfile, function(entry) {
        #to handle multiple coauthors and paste into a single string
        l <- lapply(entry, function(eachcol) {
            paste(eachcol)
        })

        df <- as.data.frame(l)
        df
    })
}))
names(papers) <- c("paper ID", "author ID", "coauthor names", "paper title", "journal title")

我没有数据来测试它,如果仍然失败,请给我一个喊叫。

一个相关的qn:你为什么不以data.frames而不是列表的形式阅读文本文件?

答案 1 :(得分:1)

我尝试重新创建一些与数据结构相匹配的示例数据(我希望我做对了):

## Create sample data:
createList <- function(j){
    nElem <- 5
    paperIDVec <- sample.int(1000, nElem, replace = FALSE) 
    authorIDVec <- sample.int(1000, nElem, replace = FALSE) 
    coauthorsList <- lapply(1:nElem, function(ii){
                paste("Coauthor", 1:sample.int(3, 1))           
            })
    paperTitleVec <- paste("Some brilliant idea that author", authorIDVec, "had")
    journalVec <- vapply(1:nElem, function(ii) paste("Journal", 
                        paste(LETTERS[sample.int(26, 3, replace = TRUE)], collapse = "")), character(1))
    outList <- lapply(1:nElem, function(ii){
                list(paperIDVec[ii], authorIDVec[ii],
                        coauthorsList[[ii]], paperTitleVec[ii],
                        journalVec[ii])         
            })
}
mat <- lapply(1:4, createList)

使用这些数据并按照@ chinsoon12的方法,我首先将条目粘贴在一起,为每个条目创建一个单独的字符(例如,三个共同作者c("Mr. X", "Mrs. J", "Mr. M")的向量变为"Mr. X, Mrs. J, Mr. M"),并且然后将数据转换为数据框并连续组合它们以创建一个大数据框:

## Turn nested list into one data frame:
textFileDfList <- lapply(mat, function(listLevel2) {            
            ## Convert list on second level of hierarchy (= one text file)
            ## to a list of data frames (one for each entry)            
            dataFrameList <- lapply(listLevel2, function(listLevel3){
                        ## Paste multiple entries (e.g. vector of co-authors)
                        ## together to create a single character entry:
                        simplifiedList <- lapply(listLevel3, 
                                function(entries) paste(entries, collapse = ", "))
                        ## Create data.frame:
                        outDf <- as.data.frame(simplifiedList, 
                                stringsAsFactors = FALSE, 
                                col.names =  c("paper ID", "author ID", "coauthor names", 
                                        "paper title", "journal title"))                                                    
                    })

            ## Combine data frames of the single entries to one data frame,
            ## containing all entries of the text file:
            textFileDf <- do.call('rbind', dataFrameList)           
        })
## Combine data frames of the text files to one big data frame:
bigDataFrame <- do.call('rbind', textFileDfList)

> head(bigDataFrame)
  paper.ID author.ID                     coauthor.names
1      862       990             Coauthor 1, Coauthor 2
2      688       400                         Coauthor 1
3      921       963 Coauthor 1, Coauthor 2, Coauthor 3
4      479       455             Coauthor 1, Coauthor 2
5      709       340                         Coauthor 1
6      936       591             Coauthor 1, Coauthor 2
                              paper.title journal.title
1 Some brilliant idea that author 990 had   Journal PZR
2 Some brilliant idea that author 400 had   Journal MQD
3 Some brilliant idea that author 963 had   Journal WFW
4 Some brilliant idea that author 455 had   Journal TZV
5 Some brilliant idea that author 340 had   Journal DCR
6 Some brilliant idea that author 591 had   Journal EGW