我的数据结构如下:
Individ <- data.frame(Participant = c("Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill",
"Harry", "Harry", "Harry", "Harry","Harry", "Harry", "Harry", "Harry", "Paul", "Paul", "Paul", "Paul"),
Time = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4),
Condition = c("Placebo", "Placebo", "Placebo", "Placebo", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr",
"Placebo", "Placebo", "Placebo", "Placebo", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr"),
Power = c(400, 250, 180, 500, 300, 450, 600, 512, 300, 500, 450, 200, 402, 210, 130, 520, 310, 451, 608, 582, 390, 570, NA, NA))
使用dplyr
我通过以下代码应用滚动平均值(从2到4秒):
for (summaryFunction in c("mean")) {
for ( i in seq(2, 4, by = 1)) {
tempColumn <- Individ %>%
group_by(Participant) %>%
transmute(rollapply(Power,
width = i,
FUN = summaryFunction,
align = "right",
fill = NA,
na.rm = T))
colnames(tempColumn)[2] <- paste("Rolling", summaryFunction, as.character(i), sep = ".")
Individ <- bind_cols(Individ, tempColumn[2])
}
}
我现在希望计算每个滚动平均值中每个Power
的{{1}}的前5%。为了计算这个,我使用:
Participant
但是,我最终会找到一个列出Output = ddply(Individ, .(Participant, Condition), summarise,
TwoSec <- Rolling.mean.2 > quantile(Rolling.mean.2 , 0.95, na.rm = TRUE))
或TRUE
的列。相反,我追随的是前5%的实际值。我该怎么做呢?是否还有一种更简单的方法来迭代每个滚动平均值列,按参与者和条件,找到每个列的前5%?
谢谢!
答案 0 :(得分:1)
获得滚动数据表很好,这使得计算分位数的工作变得更加容易。
第1步:按参与者分组,条件,位置
Individ <- data.frame(Participant = c("Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill",
"Harry", "Harry", "Harry", "Harry","Harry", "Harry", "Harry", "Harry", "Paul", "Paul", "Paul", "Paul"),
Time = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4),
Condition = c("Placebo", "Placebo", "Placebo", "Placebo", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr",
"Placebo", "Placebo", "Placebo", "Placebo", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr"),
Location = c("Home", "Home", "Home", "Home", "Away", "Away", "Away", "Away", "Home", "Home", "Home", "Home",
"Home", "Home", "Home", "Home", "Away", "Away", "Away", "Away", "Home", "Home", "Home", "Home"),
Power = c(400, 250, 180, 500, 300, 450, 600, 512, 300, 500, 450, 200, 402, 210, 130, 520, 310, 451, 608, 582, 390, 570, NA, NA))
library(dplyr)
library(zoo)
for (summaryFunction in c("mean")) {
for ( i in seq(2, 4, by = 1)) {
tempColumn <- Individ %>%
group_by(Participant) %>%
transmute(rollapply(Power,
width = i,
FUN = summaryFunction,
align = "right",
fill = NA,
na.rm = T))
colnames(tempColumn)[2] <- paste("Rolling", summaryFunction, as.character(i), sep = ".")
Individ <- bind_cols(Individ, tempColumn[2])
}
}
Individ
Participant Time Condition Location Power Rolling.mean.2 Rolling.mean.3 Rolling.mean.4
(fctr) (dbl) (fctr) (fctr) (dbl) (dbl) (dbl) (dbl)
1 Bill 1 Placebo Home 400 NA NA NA
2 Bill 2 Placebo Home 250 325 NA NA
3 Bill 3 Placebo Home 180 215 276.6667 NA
4 Bill 4 Placebo Home 500 340 310.0000 332.5
5 Bill 1 Expr Away 300 400 326.6667 307.5
6 Bill 2 Expr Away 450 375 416.6667 357.5
7 Bill 3 Expr Away 600 525 450.0000 462.5
8 Bill 4 Expr Away 512 556 520.6667 465.5
9 Bill 1 Expr Home 300 406 470.6667 465.5
10 Bill 2 Expr Home 500 400 437.3333 478.0
获得所有7或8列(此数据集包括位置)后,它也回答了另一个问题,在新的个人数据集中,这是我为解决您的问题所做的工作。我百分百肯定有更清洁,更有效的方法来做到这一点,但这里有逻辑,它应该输出正常。
第2步:为群体获取分位数
library(plyr)
Individ[is.na(Individ)]<- 0
Top_percentiles <- ddply(Individ,
c("Participant", "Condition", "Location"),
summarise,
Power2 = quantile(Rolling.mean.2, .95),
Power3 = quantile(Rolling.mean.3, .95),
Power4 = quantile(Rolling.mean.4, .95)
)
Top_percentiles
Participant Condition Location Power2 Power3 Power4
1 Bill Expr Away 551.350 510.0667 465.050
2 Bill Expr Home 464.650 465.6667 476.125
3 Bill Placebo Home 337.750 305.0000 282.625
4 Harry Expr Away 585.175 533.4000 485.425
5 Harry Placebo Home 322.150 280.7667 268.175
6 Paul Expr Home 556.500 556.5000 408.000
这是每组最高5%的门槛和相应的滚动平均值。
现在唯一要做的就是计算数据集中高于每个阈值的观察结果。
步骤3:将滚动平均列与原始数据集匹配
像这样的东西有点像我在修补。
Individ$Power2 <- Top_percentiles$Power2[match(Individ$Participant, Top_percentiles$Participant) &&
match(Individ$Condition, Top_percentiles$Condition) &&
match(Individ$Location, Top_percentiles$Location)]
Individ$Power3 <- Top_percentiles$Power3[match(Individ$Participant, Top_percentiles$Participant) &&
match(Individ$Condition, Top_percentiles$Condition) &&
match(Individ$Location, Top_percentiles$Location)]
Individ$Power4 <- Top_percentiles$Power4[match(Individ$Participant, Top_percentiles$Participant) &&
match(Individ$Condition, Top_percentiles$Condition) &&
match(Individ$Location, Top_percentiles$Location)]
Individ
Participant Time Condition Location Power Rolling.mean.2 Rolling.mean.3 Rolling.mean.4 Power2 Power3
(fctr) (dbl) (fctr) (fctr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
1 Bill 1 Placebo Home 400 0 0.0000 0.0 551.350 510.0667
2 Bill 2 Placebo Home 250 325 0.0000 0.0 464.650 465.6667
3 Bill 3 Placebo Home 180 215 276.6667 0.0 337.750 305.0000
4 Bill 4 Placebo Home 500 340 310.0000 332.5 585.175 533.4000
5 Bill 1 Expr Away 300 400 326.6667 307.5 322.150 280.7667
6 Bill 2 Expr Away 450 375 416.6667 357.5 556.500 556.5000
7 Bill 3 Expr Away 600 525 450.0000 462.5 551.350 510.0667
8 Bill 4 Expr Away 512 556 520.6667 465.5 464.650 465.6667
9 Bill 1 Expr Home 300 406 470.6667 465.5 337.750 305.0000
10 Bill 2 Expr Home 500 400 437.3333 478.0 585.175 533.4000
我的想法是将分位数列与Individual数据集相匹配。
第4步:过滤数据集
这应该让你想要的。
选项1:三个单独的数据集
top_percentile_2sec <- Individ %>% filter(Rolling.mean.2 >= Power2)
top_percentile_3sec <- Individ %>% filter(Rolling.mean.3 >= Power3)
top_percentile_4sec <- Individ %>% filter(Rolling.mean.4 >= Power4)
选项2:一个大的合并数据集
top_percentile_all_times <- Individ %>% filter(Rolling.mean.2 >= Power2 | Rolling.mean.3 >= Power3 | Rolling.mean.4 >= Power4)
top_percentile_all_times
Participant Time Condition Location Power Rolling.mean.2 Rolling.mean.3 Rolling.mean.4 Power2 Power3
(fctr) (dbl) (fctr) (fctr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
1 Bill 1 Expr Away 300 400.0 326.6667 307.50 322.15 280.7667
2 Bill 4 Expr Away 512 556.0 520.6667 465.50 464.65 465.6667
3 Bill 1 Expr Home 300 406.0 470.6667 465.50 337.75 305.0000
4 Bill 3 Expr Home 450 475.0 416.6667 440.50 322.15 280.7667
5 Harry 1 Expr Away 310 415.0 320.0000 292.50 322.15 280.7667
6 Harry 3 Expr Away 608 529.5 456.3333 472.25 551.35 510.0667
7 Harry 4 Expr Away 582 595.0 547.0000 487.75 464.65 465.6667
8 Paul 3 Expr Home 0 570.0 480.0000 0.00 322.15 280.7667
9 Paul 4 Expr Home 0 0.0 570.0000 480.00 556.50 556.5000
以下链接对我有很大帮助。
<强> how to calculate 95th percentile of values with grouping variable in R or Excel 强>
这是否也解决了其他帖子中的问题?