这是我的输入数据框:
df <- data.frame(Col1=c("A", "B", "C", "B", "C", "A", "A", "C"),Col2=c("Blue", "Red", "Blue", "Blue", "Blue", "Red", "Red", "Blue"),Col3=c("Young", "Old", "Old", "Young", "Young", "Young", "Old", "Old"))
df
Col1 Col2 Col3
1 A Blue Young
2 B Red Old
3 C Blue Old
4 B Blue Young
5 C Blue Young
6 A Red Young
7 A Red Old
8 C Blue Old
我正尝试获取如下所示的列联表:
Blue Red Young Old
A 1 2 2 1
B 1 1 1 1
C 3 0 1 2
我几乎可以使用以下命令,但Col2和Col3组合在一起:
as.data.frame(table(df)) %>% dcast(Col1 ~ Col2 + Col3, value.var="Freq")
Col1 Blue_Old Blue_Young Red_Old Red_Young
1 A 0 1 1 1
2 B 0 1 1 0
3 C 2 1 0 0
答案 0 :(得分:2)
一个选择是将{Col2','Col3'gather
转换为长格式,获取{Col1'的count
和'val'列,然后spread
将其返回转换为“宽”格式
library(tidyverse)
df %>%
gather(key, val, Col2:Col3) %>%
count(Col1, val) %>%
spread(val, n, fill = 0)
# A tibble: 3 x 5
# Col1 Blue Old Red Young
# <fct> <dbl> <dbl> <dbl> <dbl>
#1 A 1 1 2 2
#2 B 1 1 1 1
#3 C 3 2 0 1
当OP使用dcast
时,一个紧凑的选项是
library(data.table)
dcast(melt(setDT(df), id.var = 'Col1'), Col1~ value)
# Col1 Blue Old Red Young
#1: A 1 1 2 2
#2: B 1 1 1 1
#3: C 3 2 0 1
答案 1 :(得分:2)
使用table
:
cbind(table(df$Col1,df$Col2),table(df$Col1,df$Col3))
# Blue Red Old Young
#A 1 2 1 2
#B 1 1 1 1
#C 3 0 2 1
答案 2 :(得分:2)
可以使用任意数量列的基本R选项可以
do.call(cbind, lapply(df[-1], function(i) table(df$Col1, i)))
# Blue Red Old Young
#A 1 2 1 2
#B 1 1 1 1
#C 3 0 2 1