我有一个数据集,每列有多列和多个值。我想要的是每个列的每个值的计数由groupID
分组示例
GroupId | C1 | C2
1 | "valColOne1" | "valColTwo2"
2 | "valColOne1" | "valColTwo2"
2 | "valColOne1" | "valColTwo2"
2 | "valColOne2" | "valColTwo1"
1 | "valColOne1" | "valColTwo1"
结果应为
GroupId | valColOne1 | valColOne2 | valColTwo1 | valColTwo2
1 | 2 | 0 | 1 | 1
2 | 2 | 1 | 1 | 2
要提及初始表中的所有值都是字符串。
答案 0 :(得分:4)
将原始数据框(我称之为dat
)和melt
转换为长格式。然后使用dcast
计算每个值的出现次数。
library(reshape2)
dat.m = melt(dat, id.var="GroupId")
dcast(dat.m, GroupId ~ value)
GroupId valColOne1 valColOne2 valColTwo1 valColTwo2
1 1 2 0 1 1
2 2 2 1 1 2
答案 1 :(得分:2)
您可以使用table
base R
table(data.frame(GroupId= df1$GroupId, Val=unlist(df1[-1])))
# Val
# GroupId valColOne1 valColOne2 valColTwo1 valColTwo2
# 1 2 0 1 1
# 2 2 1 1 2
df1 <- structure(list(GroupId = c(1, 2, 2, 2, 1), C1 = c("valColOne1",
"valColOne1", "valColOne1", "valColOne2", "valColOne1"),
C2 = c("valColTwo2",
"valColTwo2", "valColTwo2", "valColTwo1", "valColTwo1")),
.Names = c("GroupId",
"C1", "C2"), row.names = c(NA, -5L), class = "data.frame")