R中的双向列联表

时间:2017-09-03 18:09:07

标签: r two-way contingency

我有一个数据框,我想从两列输出一个双向列联表。他们都有价值观#太少","关于权利"或者"太多"。

我打字

df %>%
  filter(!is.na(col1)) %>%
  group_by(col1) %>%
  summarise(count = n())

对于他们两个人分别得到这样的东西:

col1        count
<fctr>      <int>
Too Little  19259           
About Right 9539            
Too Much    2816    

我想要达到的目的是:

       Too Little   About Right   Too Much   Total
col1   19259        9539          2816       31614
col2   20619        9374          2262       32255
Total  39878       18913          5078       63869

我一直在尝试使用表格功能

addmargins(table(df$col1, df$col2))

但结果不是我想要的

              Too Little About Right Too Much   Sum
  Too Little       13770        4424      740 18934
  About Right       4901        3706      700  9307
  Too Much          1250         800      679  2729
  Sum              19921        8930     2119 30970

2 个答案:

答案 0 :(得分:3)

我尝试tabulate,这是table的基础(参见?tabulate)。例如给出

set.seed(123)
vals <- LETTERS[1:3]
df <- as.data.frame(replicate(3, sample(vals, 5, T)))
df <- data.frame(lapply(df, "levels<-", vals))

然后你可以做

m <- t(sapply(df, tabulate, nbins = length(vals)))
colnames(m) <- vals
addmargins(m)
#     A B C Sum
# V1  1 1 3   5
# V2  1 3 1   5
# V3  1 2 2   5
# Sum 3 6 6  15

或者(通过@thelatemail)只是

addmargins(t(sapply(df, table)))
#     A B C Sum
# V1  1 1 3   5
# V2  1 3 1   5
# V3  1 2 2   5
# Sum 3 6 6  15   

答案 1 :(得分:2)

我们可以在循环中使用表,然后使用rbind:

# Using dummy data from @lukeA's answer

addmargins(do.call(rbind, lapply(df1, table)))
#     A B C Sum
# V1  1 1 3   5
# V2  1 3 1   5
# V3  1 2 2   5
# Sum 3 6 6  15

<强>基准

# bigger data
set.seed(123)
vals <- LETTERS[1:20]
df1 <- as.data.frame(replicate(20, sample(vals, 100000, T)))
df1 <- data.frame(lapply(df1, "levels<-", vals))


microbenchmark::microbenchmark(
  lukeA = {
    m1 <- t(sapply(df1, tabulate, nbins = length(vals)))
    colnames(m1) <- vals
    m1 <- addmargins(m1)
  },
  # as vals only used for luke's solution, keep it in.
  lukeA_1 = {
    vals <- LETTERS[1:20]
    m2 <- t(sapply(df1, tabulate, nbins = length(vals)))
    colnames(m2) <- vals
    m2 <- addmargins(m2)
  },
  thelatemail = {m3 <- addmargins(t(sapply(df1, table)))}, 
  zx8754 = {m4 <- addmargins(do.call(rbind, lapply(df1, table)))}
)
# Unit: milliseconds
#        expr       min        lq      mean    median        uq        max neval
#       lukeA  2.349969  2.371922  2.518447  2.473839  2.558653   3.363738   100
#     lukeA_1  2.351680  2.377196  2.523473  2.473839  2.542831   3.459242   100
# thelatemail 38.316506 42.054136 43.785777 42.674912 44.234193  90.287809   100
#      zx8754 38.695101 41.979728 44.933602 42.762006 44.244314 110.834292   100