Question

我有不同列表中的项目，我想计算每个列表中的项目并将其输出到表格。但是，当列表中有不同的项目时，我遇到了困难。太说明了我的问题：

item_1 <- c("A","A","B")
item_2 <- c("A","B","B","B","C")
item_3 <- c("C","A")
item_4 <- c("D","A", "A")
item_5 <- c("B","D")


list_1 <- list(item_1, item_2, item_3)
list_2 <- list(item_4, item_5)

table_1 <- table(unlist(list_1))
table_2 <- table(unlist(list_2))

> table_1

A B C 
4 4 2 
> table_2

A B D 
2 1 2

我从cbind得到的是：

> cbind(table_1, table_2)

  table_1 table_2
A       4       2
B       4       1
C       2       2

这显然是错误的。我需要的是：

  table_1 table_2
A       4       2
B       4       1
C       2       0
D       0       2

提前致谢

Answer 1

如果可能的话，最好在开始时使用factors，例如：

L <- list(list_1 = list_1, 
          list_2 = list_2)
RN <- unique(unlist(L))
do.call(cbind, 
        lapply(L, function(x)
          table(factor(unlist(x), RN))))
#   list_1 list_2
# A      4      2
# B      4      1
# C      2      0
# D      0      2

但是，继续使用您所拥有的功能，以下功能可能会很有用。我已添加评论，以帮助解释每个步骤中发生的情况。

myFun <- function(..., fill = 0) {
  ## Get the names of the ...s. These will be our column names
  CN <- sapply(substitute(list(...))[-1], deparse)
  ## Put the ...s into a list
  Lst <- setNames(list(...), CN)
  ## Get the relevant row names
  RN <- unique(unlist(lapply(Lst, names), use.names = FALSE))
  ## Create an empty matrix. `fill` can be anything--it's set to 0
  M <- matrix(fill, length(RN), length(CN),
              dimnames = list(RN, CN))
  ## Use match to identify the correct row to fill in
  Row <- lapply(Lst, function(x) match(names(x), RN))
  ## use matrix indexing to fill in the unlisted values of Lst
  M[cbind(unlist(Row), 
          rep(seq_along(Lst), vapply(Row, length, 1L)))] <-
    unlist(Lst, use.names = FALSE)
  ## Return your matrix
  M
}

应用于你的两个表，结果如下：

myFun(table_1, table_2)
#   table_1 table_2
# A       4       2
# B       4       1
# C       2       0
# D       0       2

以下是为问题添加另一个table的示例。它还演示了使用NA作为fill值。

set.seed(1) ## So you can get the same results as me
table_3 <- table(sample(LETTERS[3:6], 20, TRUE) )
table_3
# 
# C D E F 
# 2 7 9 2

myFun(table_1, table_2, table_3, fill = NA)
#   table_1 table_2 table_3
# A       4       2      NA
# B       4       1      NA
# C       2      NA       2
# D      NA       2       7
# E      NA      NA       9
# F      NA      NA       2

Answer 2

要修复现有问题，可以将两个表放入一个列表中，并将缺少的值添加回名称。这里，nm是每个表唯一的表名的向量，{{1是一个表的列表，我们可以使用tbs来追加和重新排序缺失的值。

sapply

一般解决方案，当您有未知数并且可以保留> nm <- unique(unlist(mget(paste("item", 1:5, sep = "_")))) > tbs <- list(t1 = table_1, t2 = table_2) > sapply(tbs, function(x) { x[4] <- 0L names(x)[4] <- nm[!nm %in% names(x)] x[nm] }) t1 t2 A 4 2 B 4 1 C 2 0 D 0 2值时，

NA

但你可以完全避免这种情况，直接从> sapply(tbs, function(x) { length(x) <- length(nm) x <- x[match(nm, names(x))] setNames(x, nm) }) t1 t2 A 4 2 B 4 1 C 2 NA D NA 2到items。您将项目放入列表中，然后在下一步中将它们取消列出。 table中有一个useNA参数，即使它们为零，也会保持因子水平。

table

Answer 3

快速解决您的问题是将表格转换为数据框，然后合并它们：

    d1 <- data.frame(value=names(table_1), table_1=as.numeric(table_1))
    d2 <- data.frame(value=names(table_2), table_2=as.numeric(table_2))
    merge(d1,d2, all=TRUE)

这将创建NA，你可能想要0。可以用

修复

    M <- merge(d1,d2, all=TRUE) 
    M[is.na(M)] <- 0

将表与不同元素组合在一起

3 个答案: