R中GroupID的值出现次数

时间:2015-03-30 15:06:32

标签: r aggregate

我有一个数据集,每列有多列和多个值。我想要的是每个列的每个值的计数由groupID

分组

示例

 GroupId | C1            |    C2
      1  | "valColOne1"  | "valColTwo2"
      2  |  "valColOne1" | "valColTwo2"
      2  |  "valColOne1" | "valColTwo2"
      2  |  "valColOne2" | "valColTwo1"
      1  |  "valColOne1" | "valColTwo1"

结果应为

    GroupId | valColOne1 | valColOne2 | valColTwo1 | valColTwo2
         1  |    2       |     0      |    1       |  1
         2  |    2       |     1      |    1       |  2

要提及初始表中的所有值都是字符串。

2 个答案:

答案 0 :(得分:4)

将原始数据框(我称之为dat)和melt转换为长格式。然后使用dcast计算每个值的出现次数。

library(reshape2)

dat.m = melt(dat, id.var="GroupId")

dcast(dat.m, GroupId ~ value)

  GroupId   valColOne1    valColOne2   valColTwo1  valColTwo2
1       1             2             0           1           1
2       2             2             1           1           2

最简单的方法是看看每个函数在运行它们时的作用并查看中间结果。有关示例,请参阅herehere

答案 1 :(得分:2)

您可以使用table

中的base R
table(data.frame(GroupId= df1$GroupId, Val=unlist(df1[-1])))
#         Val
# GroupId valColOne1 valColOne2 valColTwo1 valColTwo2
#  1          2          0          1          1
#  2          2          1          1          2

数据

df1 <- structure(list(GroupId = c(1, 2, 2, 2, 1), C1 = c("valColOne1", 
"valColOne1", "valColOne1", "valColOne2", "valColOne1"), 
C2 =   c("valColTwo2", 
"valColTwo2", "valColTwo2", "valColTwo1", "valColTwo1")),
.Names =  c("GroupId", 
"C1", "C2"), row.names = c(NA, -5L), class = "data.frame")