按类别对R中的二进制数据帧进行分组

时间:2018-03-01 16:48:12

标签: r dataframe aggregate

我的数据框df目前看起来像这样:

  cat 1 2 3 4
1 a   0 1 0 1
2 b   0 0 1 0 
3 b   1 0 1 1 
4 a   1 0 1 1
5 b   1 1 1 1
6 a   0 1 1 0

cat <- c("a", "b", "b", "a", "b", "a")
df = cbind(cat, data.frame(matrix(c(0, 1, 0, 1, 0, 
0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 
1, 0), nrow=6, byrow = T)))

(即第一列中的2个类别,以及每个后续列中每个类别的二进制数据)。理想情况下,我想按类别对每个列进行分组,但也要按二进制类别进行分组,最后得到如下内容:

1 a.0 2 1 1 1
2 a.1 1 2 2 2
3 b.0 0 1 0 1
4 b.1 2 1 2 2

到目前为止,我最好的尝试是:

aggregate(df[,-1], by=list(df[,1]), FUN = table)

但不幸的是,这并没有让我知道我的遗产

3 个答案:

答案 0 :(得分:1)

您可以通过以下方式计算数据框中的每个二进制类别:

df[df$cat == "a", -1]  == 1

此示例来自a和1.该命令将返回下表:

     X1    X2    X3    X4
1 FALSE  TRUE FALSE  TRUE
4  TRUE FALSE  TRUE  TRUE
6 FALSE  TRUE  TRUE FALSE

现在,您可以按列向该函数应用一个总和来获取其中一行。在这种情况下,它返回数据帧的第a.1行:

apply(df[df$cat == "a", -1]  == 1, 2, sum)

同样,您可以找到其他剩余的行。

apply(df[df$cat == "a", -1]  == 0, 2, sum)
apply(df[df$cat == "a", -1]  == 1, 2, sum)
apply(df[df$cat == "b", -1]  == 0, 2, sum)
apply(df[df$cat == "b", -1]  == 1, 2, sum)

如果你真的需要重复这个操作,可以建立一个迭代函数,在每次迭代中你根据cat的值改变感兴趣的值,即

for (val in levels(df$cat)) apply(df[df$cat == val, -1]  == 1, 2, sum)

希望它有效,抱歉我的通心粉英语。

答案 1 :(得分:1)

希望这有帮助!

library(dplyr)
library(tidyr)

df %>%
  gather(key, value, -cat) %>%
  mutate(new_cat=paste(cat, value, sep="_")) %>%
  group_by(new_cat, key) %>%
  tally() %>%
  spread(key, n) %>%
  replace(., is.na(.), 0)

输出是:

  new_cat    X1    X2    X3    X4
1     a_0     2     1     1     1
2     a_1     1     2     2     2
3     b_0     1     2     0     1
4     b_1     2     1     3     2

示例数据:

df <- structure(list(cat = c("a", "b", "b", "a", "b", "a"), X1 = c(0L, 
0L, 1L, 1L, 1L, 0L), X2 = c(1L, 0L, 0L, 0L, 1L, 1L), X3 = c(0L, 
1L, 1L, 1L, 1L, 1L), X4 = c(1L, 0L, 1L, 1L, 1L, 0L)), .Names = c("cat", 
"X1", "X2", "X3", "X4"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

答案 2 :(得分:0)

df <- structure(list(cat = c("a", "b", "b", "a", "b", "a"), X1 = c(0L, 
0L, 1L, 1L, 1L, 0L), X2 = c(1L, 0L, 0L, 0L, 1L, 1L), X3 = c(0L, 
1L, 1L, 1L, 1L, 1L), X4 = c(1L, 0L, 1L, 1L, 1L, 0L)), .Names = c("cat", 
"X1", "X2", "X3", "X4"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

df <- split(df, df$cat) # Split by Cat
df <- lapply(seq_along(df),function(i) 
      {
        kk<- apply(df[[i]],2,table) # Find frequency in each column 
        kk <- data.frame(do.call(cbind, kk)) # Combine list by column 
        kk$cat <- paste(names(df)[i],rownames(kk), sep = ".") # Define name of cat column 
        rownames(kk)<- NULL
        kk
      })
n_df <- do.call(rbind, df) # Combine list by row