如何合并频率表和缺失值?

时间:2018-11-07 18:03:54

标签: r

我有以下表格列表:

    list(structure(c(`0` = 19L, `1` = 2L, `3` = 43L), .Dim = 3L, .Dimnames = structure(list(
    c("0", "1", "3")), .Names = ""), class = "table"), structure(c(`0` = 7L, 
`1` = 9L, `2` = 5L, `3` = 43L), .Dim = 4L, .Dimnames = structure(list(
    c("0", "1", "2", "3")), .Names = ""), class = "table"), structure(c(`0` = 14L, 
`1` = 2L, `2` = 4L, `3` = 44L), .Dim = 4L, .Dimnames = structure(list(
    c("0", "1", "2", "3")), .Names = ""), class = "table"), structure(c(`0` = 21L, 
`1` = 8L, `2` = 2L, `3` = 33L), .Dim = 4L, .Dimnames = structure(list(
    c("0", "1", "2", "3")), .Names = ""), class = "table"), structure(c(`0` = 23L, 
`1` = 3L, `2` = 1L, `3` = 37L), .Dim = 4L, .Dimnames = structure(list(
    c("0", "1", "2", "3")), .Names = ""), class = "table"), structure(c(`0` = 19L, 
`1` = 2L, `2` = 4L, `3` = 39L), .Dim = 4L, .Dimnames = structure(list(
    c("0", "1", "2", "3")), .Names = ""), class = "table"), structure(c(`0` = 22L, 
`1` = 1L, `2` = 4L, `3` = 37L), .Dim = 4L, .Dimnames = structure(list(
    c("0", "1", "2", "3")), .Names = ""), class = "table"))

每个表都是值0、1、2或3的观察值。但是,并非所有值都在所有表中表示,因此某些表缺少列。我希望在最终输出中将这些缺失的值分配为0。

merge在列表上效果不佳,并且不适用于rbind,因为并非所有表都具有匹配的列。

如何将这些表组合成一个矩阵或data.frame,每个值(0、1、2、3)一列,每个计数一行(在本例中为7)?

最终输出应如下所示:

structure(list(`0` = c(19L, 7L, 14L, 21L, 23L, 19L, 22L), `1` = c(2L, 
9L, 2L, 8L, 3L, 2L, 1L), `2` = c(0L, 5L, 4L, 2L, 1L, 4L, 4L), 
    `3` = c(43L, 43L, 44L, 33L, 37L, 39L, 37L)), class = "data.frame", row.names = c(NA, 
-7L))

2 个答案:

答案 0 :(得分:1)

我们使用data.frame将单个数据集转换为map,然后使用bind_rows将数据集行绑定到单个数据集

library(tidyverse)
map(lst, as.data.frame.list, check.names = FALSE) %>% 
          bind_rows

答案 1 :(得分:1)

在基数R中,假设您的列表名为mylist,则可以执行以下操作。

all_names <- sort(unique(unlist(lapply(mylist, names))))

res <- do.call("rbind", lapply(mylist, function(x) x[all_names]))
print(res)
#      0 1 <NA>  3
#[1,] 19 2   NA 43
#[2,]  7 9    5 43
#[3,] 14 2    4 44
#[4,] 21 8    2 33
#[5,] 23 3    1 37
#[6,] 19 2    4 39
#[7,] 22 1    4 37

现在,您可以接受它,也可以进行一些编辑以使其完美:

colnames(res) <- all_names  # Ensure correct colnames
res[is.na(res)] <- 0        # Overwrite NAs with 0
print(res)
#      0 1 2  3
#[1,] 19 2 0 43
#[2,]  7 9 5 43
#[3,] 14 2 4 44
#[4,] 21 8 2 33
#[5,] 23 3 1 37
#[6,] 19 2 4 39
#[7,] 22 1 4 37