我有一个由三列组成的数据框:ID,Trial和差异度量(diff_DT)。我有19名参与者,每人完成30次试验。这就是我的数据框架的样子:
ID Trial diff_DT
01 005 37,5
01 006 40,5
01 007 16,5
... ... ...
02 005 16,5
... ... ...
02 016 27,9
总共30个试验中的6个属于一个区块:区块1:试验5-10,区块2:试验16-21,区块3:试验26-31,区块4:试验36-41,区块5:试验46-51(注意:试验数量> 30,因为参与者完成了更多的试验)
现在我需要为每个块的每个参与者提供变量diff_DT的均值,从而为每个参与者提供五种方法。我不知道如何正确地做到这一点。 谢谢你的建议!
答案 0 :(得分:1)
您可以为块/试验创建单独的关键数据框或矩阵,将其合并到原始表中,然后运行聚合以获得平均分数。
ID <- c(rep(1, 3), 2, 2)
Trial <- c(5, 6, 7, 5, 16)
diff_DT <- c(37.5, 40.5, 16.5, 16.5, 27.9)
Trial.key <- c(5:10, 16:21, 26:31, 36:41, 46:51)
block <- rep(1:5, each = 6)
df <- data.frame(ID, Trial, diff_DT)
blocks <- data.frame(Trial.key, block)
df.blocks <- merge(df, blocks, by.x = "Trial", by.y = "Trial.key", all.x = TRUE,
all.y = FALSE)
df.blocks
# Trial ID diff_DT block
# 5 1 37.5 1
# 5 2 16.5 1
# 6 1 40.5 1
# 7 1 16.5 1
# 16 2 27.9 2
df.agg <- with(df.blocks, aggregate(diff_DT, by = list(ID, Trial),
FUN = "mean"))
names(df.agg) <- c("ID", "Trial", "mean.diff_DT")
df.agg
# ID Trial mean.diff_DT
# 1 5 37.5
# 2 5 16.5
# 1 6 40.5
# 1 7 16.5
# 2 16 27.9
答案 1 :(得分:0)
看看这对你有帮助。
bd <- data.frame(ID = rep(1:6, each = 30),
Trial = c(sample(c(5:10,16:21,26:31,36:41,46:51), 30),
sample(c(5:10,16:21,26:31,36:41,46:51), 30),
sample(c(5:10,16:21,26:31,36:41,46:51), 30),
sample(c(5:10,16:21,26:31,36:41,46:51), 30),
sample(c(5:10,16:21,26:31,36:41,46:51), 30),
sample(c(5:10,16:21,26:31,36:41,46:51), 30)),
diff_DT = rnorm(n = 180, mean = 30, sd = 2))
library(dplyr)
bd <- bd %>%
mutate(block = ifelse(Trial <= 10, 1,
ifelse(Trial <= 21, 2,
ifelse(Trial <= 31, 3,
ifelse(Trial <= 41, 4, 5)))))
bd %>%
group_by(ID, block) %>%
summarise(Mean = mean(diff_DT))
答案 2 :(得分:0)
如果您只想使用基数R,那么您可以在数据框中创建列block
,然后为每个块中的每个参与者应用mean
函数。
如果试验是数字(考虑到你的试验是001,002 ......可能不是这种情况),你可以
df$block = ifelse(df$trial>=5 & df$trial <=10, 1,
ifelse(df$trial>=16 & df$trial <=21,2,
ifelse(df$trial>=26 & df$trial <=31,3,
ifelse(df$trial>=36 & df$trial <=41,4,
ifelse(df$trial>=46 & df$trial <=51,5,0))))
)
如果Trial不是数字(例如字符或因子),则应首先将其转换为带
的数字df$trial = as.numeric(as.character(df$trial))
然后你必须
aggregate(df$trial, by=list(df$block,df$id), mean)
答案 3 :(得分:0)
我将此数据框作为示例编写(您应该提供生成数据的代码,以便更轻松,更准确地回答):
ID <- rep(1:3, 47)
trial <- rep(5:51, 3)
diff_DT <- sample(1:10, 47*3, replace = T)
df <- data.frame(ID, trial, diff_DT)
然后我编写了一个计算块的函数,这些块的分配方式就像你在问题中写的一样,如果你需要一些预处理只是问:
computeBlocks <- function(df){
block <- rep(NA, nrow(df))
for(i in 1:length(block)){
for(j in 1:4){
if(as.numeric(df$trial[i]) >= 6+10*j && as.numeric(df$trial[i]) <= 11+10*j){
block[i] <- j+1
break
}
}
if(as.numeric(df$trial[i]) >= 5 && as.numeric(df$trial[i]) <= 10){
block[i] <- 1
}
}
df <- cbind(df, block)
return(df)
}
我计算了块:
df <- computeBlocks(df)
最后使用包reshape2
我计算了每个块的每个参与者的平均值:
#install.packages("reshape2")
require(reshape2)
df_melt <- melt(df, id = c("ID", "block"))
means <- dcast(df_melt, ID + block ~ variable, mean)[,-3]
means
你的问题不是那么清楚所以请告诉我是否需要改进。