Question

我是dplyr的新手。这是我的简化数据框：

ID S1 S2 S3
1  45 36 101
1  10 45 101
1  81 81 45
2  45 101 81
2  36 36 45

首先，我想通过ID总结S1-S3中每个数字的出现：

ID 45 36 101 10 81
1  3  1   2   1  1
2  2  2   1   0  1

然后，我想为每列计算平均值，标准和CI（ID除外）。我尝试用dplyr做第一阶段：

df %>% summarize(by_SS, count=n())

但这只算了ID：

1.0 3
2.0 2

如何做到这一点？

Answer 1

这里有一个关于如何做到这一点的建议

library(dplyr)
library(reshape)

# reading your example
df <- read.table(text = ('ID S1 S2 S3
1  45 36 101
1  10 45 101
1  81 81 45
2  45 101 81
2  36 36 45'), header = T, stringsAsFactors = F)

# reshaping the data
df_1 <- df %>% melt(id.vars = 'ID')

# getting the results
table(df_1$ID, df_1$value)

#   10 36 45 81 101
# 1  1  1  3  2   2
# 2  0  2  2  1   1

Answer 2

Dplyr通常更容易使用“长格式”。你得到答案：

 library(dplyr)
 library(gmodels)

 # First part
 df <- df %>% gather(k, v, S1:S3) %>% 
     add_count(ID, v) %>% 
     mutate(v = as.character(v),
            v = ifelse(is.na(v), "missing", v)) %>% 
     select(-k) %>% 
     distinct() %>% 

 # Second part. For each count of values summarize
 # mean, ci, and sd. ci produces all which we can utilize.
 # Lastly, make values rownames and transpose.
     group_by(v) %>% 
     summarize(mean      = ci(n)[1],
               lower_ci  = ci(n)[2],
               upper_ci  = ci(n)[3],
               sd        = ci(n)[4]*sqrt(length(n))) %>% 
   remove_rownames %>% 
   column_to_rownames(var="v") %>% 
   t()

计算多列中的出现次数并使用R

2 个答案: