我有一组由不同参与者生成的数据,他们通过提供1-5之间的技术质量得分来评分不同的视力障碍。
数据样本位于Column Participant为文本(不同ID)的位置,列Impairment为文本(9种不同类型),TechnicalQuality列为数字(1-5)。
Participant <- c("A001", "A001", "A001", "A002", "A002", "A003", "B001", "B002")
impairment <- c ("H0", "H1", "H3", "H2", "H4", "H2", "H3", "H0")
TechnicalQuality <- c(1, 2, 4, 3, 5, 4, 3, 1)
Exp_1<- data.frame(Participant = Participant, impairment = impairment,
TechnicalQuality = TechnicalQuality
我希望创建一个新的数据框P_TQ_Mean,其中包括每个参与者的每种类型的损伤的平均技术质量,我使用以下代码:
P_TQ_Mean<-c()
for (i in unique(Exp_1$Participant)){
d<-subset(Exp_1, Exp_1$Participant ==i)
c <- aggregate(d$TechnicalQuality, list(d$impairment), mean)
P_TQ_Mean = rbind(P_TQ_Mean,c)
}
生成的P_TQ_Mean是:
Group.1 x
1 H0 1
2 H1 2
3 H3 4
4 H2 3
5 H4 5
6 H2 4
7 H3 3
8 H0 1
这表示“A001”分别对H0,H1和H3有三个均值,对其他参与者则依此类推。
是否有任何方法可以添加一列来指示每个损伤的平均值的参与者ID,以及正确标记前两列?例如,我需要将标签Group.1标记为“损伤”,将x标记为“参与者”以进行进一步处理。
提前致谢!
答案 0 :(得分:3)
不需要for
循环。 aggregate
可以处理多个聚合器,尝试
aggregate(TechnicalQuality ~ impairment + Participant, Exp_1, mean)
# impairment Participant TechnicalQuality
# 1 H0 A001 1
# 2 H1 A001 2
# 3 H3 A001 4
# 4 H2 A002 3
# 5 H4 A002 5
# 6 H2 A003 4
# 7 H3 B001 3
# 8 H0 B002 1
或一些更有效的选项
# install.packages("data.table")
library(data.table)
setDT(Exp_1)[, list(TechnicalQuality = sum(TechnicalQuality)), by = list(Participant, impairment)]
# Participant impairment TechnicalQuality
# 1: A001 H0 1
# 2: A001 H1 2
# 3: A001 H3 4
# 4: A002 H2 3
# 5: A002 H4 5
# 6: A003 H2 4
# 7: B001 H3 3
# 8: B002 H0 1
或者
# install.packages("dplyr")
library(dplyr)
Exp_1 %>%
group_by(Participant, impairment) %>%
summarise(mean(TechnicalQuality))
# Source: local data table [8 x 3]
# Groups: Participant
#
# Participant impairment mean(TechnicalQuality)
# 1 A001 H0 1
# 2 A001 H1 2
# 3 A001 H3 4
# 4 A002 H2 3
# 5 A002 H4 5
# 6 A003 H2 4
# 7 B001 H3 3
# 8 B002 H0 1