数据框的列联表,保留第一列作为参考

时间:2019-04-18 13:55:28

标签: r

这是我的输入数据框:

df <- data.frame(Col1=c("A", "B", "C", "B", "C", "A", "A", "C"),Col2=c("Blue", "Red", "Blue", "Blue", "Blue", "Red", "Red", "Blue"),Col3=c("Young", "Old", "Old", "Young", "Young", "Young", "Old", "Old"))

df
Col1 Col2  Col3
1    A Blue Young
2    B  Red   Old
3    C Blue   Old
4    B Blue Young
5    C Blue Young
6    A  Red Young
7    A  Red   Old
8    C Blue   Old

我正尝试获取如下所示的列联表:

   Blue   Red   Young   Old
A     1     2       2     1
B     1     1       1     1
C     3     0       1     2

我几乎可以使用以下命令,但Col2和Col3组合在一起:

as.data.frame(table(df)) %>% dcast(Col1 ~ Col2 + Col3, value.var="Freq")
  Col1 Blue_Old Blue_Young Red_Old Red_Young
1    A        0          1       1         1
2    B        0          1       1         0
3    C        2          1       0         0

3 个答案:

答案 0 :(得分:2)

一个选择是将{Col2','Col3'gather转换为长格式,获取{Col1'的count和'val'列,然后spread将其返回转换为“宽”格式

library(tidyverse)
df %>% 
  gather(key, val, Col2:Col3) %>% 
  count(Col1, val) %>% 
  spread(val, n, fill = 0)
# A tibble: 3 x 5
#  Col1   Blue   Old   Red Young
#  <fct> <dbl> <dbl> <dbl> <dbl>
#1 A         1     1     2     2
#2 B         1     1     1     1
#3 C         3     2     0     1

当OP使用dcast时,一个紧凑的选项是

library(data.table)
dcast(melt(setDT(df), id.var = 'Col1'), Col1~ value)
#   Col1 Blue Old Red Young
#1:    A    1   1   2     2
#2:    B    1   1   1     1
#3:    C    3   2   0     1

答案 1 :(得分:2)

使用table

cbind(table(df$Col1,df$Col2),table(df$Col1,df$Col3))

#   Blue Red Old Young
#A    1   2   1     2
#B    1   1   1     1
#C    3   0   2     1

答案 2 :(得分:2)

可以使用任意数量列的基本R选项可以

do.call(cbind, lapply(df[-1], function(i) table(df$Col1, i)))
#  Blue Red Old Young
#A    1   2   1     2
#B    1   1   1     1
#C    3   0   2     1