尝试在R中动态创建数据框

时间:2015-12-30 02:04:08

标签: r dataframe

我有一个字符串向量,我希望将其用作数据框的列标题。

示例:cols< - c(“A:Ike(N = 428)”,“F:Mike(N = 691)”,“G:Bike(N = 380)”,“总计(N = 1499) )“,”p值“)

我有一个字符串列表列表,我想将其添加为数据框中的数据。

示例,前三行:

[[1]] [[1]]$Female [[1]]$Female[[1]] [1] "151"   "35.3%"

[[1]]$`Age in Years` [[1]]$`Age in Years`[[1]] NULL

[[1]]$`Mean (SD)` [[1]]$`Mean (SD)`[[1]] [1] "59.7" "11.4"

[[2]] [[2]]$Female [[2]]$Female[[1]] [1] "280"   "40.5%"

[[2]]$`Age in Years` [[2]]$`Age in Years`[[1]] NULL

[[2]]$`Mean (SD)` [[2]]$`Mean (SD)`[[1]] [1] "60.3" "11.6"

[[3]] [[3]]$Female [[3]]$Female[[1]] [1] "152" "40%"

[[3]]$`Age in Years` [[3]]$`Age in Years`[[1]] NULL

[[3]]$`Mean (SD)` [[3]]$`Mean (SD)`[[1]] [1] "59.8" "11.5"

[[4]] [[4]]$Female [[4]]$Female[[1]] [1] "583"   "38.9%"

[[4]]$`Age in Years` [[4]]$`Age in Years`[[1]] NULL

[[4]]$`Mean (SD)` [[4]]$`Mean (SD)`[[1]] [1] "60"   "11.5"

[[5]] [[5]]$Female [[5]]$Female[[1]] [1] "0.190"

[[5]]$`Age in Years` [[5]]$`Age in Years`[[1]] [1] "0.614"

IOW,我想要一个数据框,其中第1列称为名称[1],由frameLists [[1]]组成。

根据以下建议,我将代码更改为以下内容:

outFrame <- do.call(data.frame, c(frameLists, stringsAsFactors = FALSE))
colnames(outFrame) <- cols

结果回来看起来像这样:

  A: Ike (N=428) F: Mike (N=691) G: Bike (N=380) Total (N=1499) p value   NA     NA   NA    NA    NA 
1            151            59.7             280           60.3     152 59.8    583   60 0.190 0.614 
2          35.3%            11.4           40.5%           11.6     40%  1.5  38.9% 11.5 0.190 0.614

我想要的结果:

  A: Ike (N=428) F: Mike (N=691) G: Bike (N=380) Total (N=1499) p value
1     151, 35.3%      280, 40.5%        152, 40%     583, 38.9%   0.190
2                                                                 0.614
3     59.7, 11.4      60.3, 11.6      59.8, 11.5       60, 11.5        

3 个答案:

答案 0 :(得分:2)

假设所有列表中的字符串数相同,请尝试

result <- do.call(data.frame, c(lapply(frameLists, unlist), stringsAsFactors=F))
names(result) <- name

列表清单的示例数据(不确定这是否是您的意思,否则请提供样本数据)和名称载体

frameLists <- list(list(c("asd", "faf"), NULL, c("3", "2")), list(c("aaa", "zzz"),NULL, c("1", "3")), list(c("qw", "gs"), NULL, c("3", "2")))
name <- c("a", "b", "c")

输出

> result
    a   b  c
1 asd aaa qw
2 faf zzz gs
3   3   1  3
4   2   3  2
> str(result)
'data.frame':   4 obs. of  3 variables:
 $ a: chr  "asd" "faf" "3" "2"
 $ b: chr  "aaa" "zzz" "1" "3"
 $ c: chr  "qw" "gs" "3" "2"

相同输入的另一种可能的解释(不确定您想要的输出):

res <- as.data.frame(do.call(cbind, lapply(frameLists, function(x) do.call(cbind, x))), stringsAsFactors=F)

输出

> res
   V1 V2  V3 V4 V5 V6
1 asd  3 aaa  1 qw  3
2 faf  2 zzz  3 gs  2
> str(res)
'data.frame':   2 obs. of  6 variables:
 $ V1: chr  "asd" "faf"
 $ V2: chr  "3" "2"
 $ V3: chr  "aaa" "zzz"
 $ V4: chr  "1" "3"
 $ V5: chr  "qw" "gs"
 $ V6: chr  "3" "2"

答案 1 :(得分:2)

您的代码无法正常工作,因为您将results初始化为空数据帧,其中R将成为0行0列的数据帧。向数据框添加列时,其行数必须与现有帧匹配。这就是您收到错误消息replacement has 2 rows, data has 0

的原因

将列表列一起绑定到数据框中更容易。问题是,data.frame函数不需要列表,它希望每列作为单独的参数:

data.frame(c(1,2,3),c(4,5,6),c(34,1,1))

如何让data.frame获取列列而不是多个参数?

这是do.call的用途!

do.call一个函数和一个args列表,它一次将args处理成一个函数。

colList <- list(c(1,2,3),c(4,5,6),c(34,1,1))
col_names <- c('a','b','c')
df <- do.call(data.frame,colList)
colnames(df) <- col_names

结果:

> df
  a b  c
1 1 4 34
2 2 5  1
3 3 6  1

如果colList是字符串向量列表,则同样适用,但您可能希望使用stringsAsFactors = F来避免data.frame之间的因子转换。

答案 2 :(得分:0)

构建以下内容以满足我的需求。这是笨重的,但到目前为止它的工作原理。第一:输出:

> myDF
             A: Ike (N=428)   F: Mike (N=691) G: Bike (N=380) Total (N=1499) p value
Female           151, 35.3%        280, 40.5%        152, 40%     583, 38.9%   0.190
Age in Years                                                                   0.614
Mean (SD)        59.7, 11.4        60.3, 11.6      59.8, 11.5       60, 11.5        
Q1, Q3               53, 68            52, 69          52, 68         52, 68        
Range                27, 88            19, 88          26, 85         19, 88        

现在生成它的代码:

#' Make a data.frame given the column headers and data to fill the data.frame
#' 
#' @param cols          Vector of text holding the column names
#' @param frameLists    List of lists holding the data for the data frame.  First list element 
#' must have all the names used in frameLists. Must be as many lists in frameLists as there are 
#' Strings in cols
#' @returnType  Data Frame
#' @return  Data Frame with all the elements set up and filled in
buildFrame <- function (cols, frameLists) {
    outList <- list()
    for (col in cols) {
        outList[[col]] <- NA
    }

    outFrame = data.frame(outList, stringsAsFactors = FALSE)
    colnames(outFrame) <- cols

    outList <- list()
    for (col in cols) {
        outList[[col]] <- list()
    }

    theNames <- names(frameLists[[1]])
    whichCol <- 1
    for (topList in frameLists) {
        colList <- outList[[whichCol]]
        for (aName in theNames) {
            data <- topList[[aName]]
            if (is.null(data)) {
                colList[[aName]] <- ""
            }
            else {
                colList[[aName]] <- data
            }
        }
        outList[[whichCol]] <- colList
        whichCol <- whichCol + 1
    }

    outFrame <- rbind(outList, outFrame)
    outFrame <- outFrame[-1 - length(theNames), ]
    rownames(outFrame) <- theNames

    return(outFrame)
}