我是dplyr的新手。 这是我的简化数据框:
ID S1 S2 S3
1 45 36 101
1 10 45 101
1 81 81 45
2 45 101 81
2 36 36 45
首先,我想通过ID总结S1-S3中每个数字的出现:
ID 45 36 101 10 81
1 3 1 2 1 1
2 2 2 1 0 1
然后,我想为每列计算平均值,标准和CI(ID除外)。 我尝试用dplyr做第一阶段:
df %>% summarize(by_SS, count=n())
但这只算了ID:
1.0 3
2.0 2
如何做到这一点?
答案 0 :(得分:0)
这里有一个关于如何做到这一点的建议
library(dplyr)
library(reshape)
# reading your example
df <- read.table(text = ('ID S1 S2 S3
1 45 36 101
1 10 45 101
1 81 81 45
2 45 101 81
2 36 36 45'), header = T, stringsAsFactors = F)
# reshaping the data
df_1 <- df %>% melt(id.vars = 'ID')
# getting the results
table(df_1$ID, df_1$value)
# 10 36 45 81 101
# 1 1 1 3 2 2
# 2 0 2 2 1 1
答案 1 :(得分:0)
Dplyr通常更容易使用“长格式”。你得到答案:
library(dplyr)
library(gmodels)
# First part
df <- df %>% gather(k, v, S1:S3) %>%
add_count(ID, v) %>%
mutate(v = as.character(v),
v = ifelse(is.na(v), "missing", v)) %>%
select(-k) %>%
distinct() %>%
# Second part. For each count of values summarize
# mean, ci, and sd. ci produces all which we can utilize.
# Lastly, make values rownames and transpose.
group_by(v) %>%
summarize(mean = ci(n)[1],
lower_ci = ci(n)[2],
upper_ci = ci(n)[3],
sd = ci(n)[4]*sqrt(length(n))) %>%
remove_rownames %>%
column_to_rownames(var="v") %>%
t()