我有一个包含组和值的数据框。首先,我计算每组99%的分位数。现在,我想删除每组99%分位数以上的值。
df<-data.frame(group = rep(c("A", "B"), each = 4),
value = c(c(6,5,80,4,60)*10,3,5,4))
# data
group value
1 A 60
2 A 50
3 A 800
4 A 40
5 B 600
6 B 3
7 B 5
8 B 4
计算各个组的quantils
quant<-aggregate(df$value, by = list(df$group), FUN = quantile, probs = 0.99)
> quant
Group.1 x
1 A 777.80
2 B 582.15
我尝试应用分位数矢量来选择较低的值。但是,它错过了组规范..
df[df$value < quant$x,]
预期结果:
group value
1 A 60
2 A 50
4 A 40
5 B 3
6 B 5
7 B 4
如何应用分位数矢量在数据框中按组保持仅低于99%的值?
答案 0 :(得分:4)
分组后我们可以if (typeof this.missing == 'undefined') {
console.log(`${getTheName(this.missing)} needs to be created.`);
}
filter
或与library(dplyr)
df %>%
group_by(group) %>%
filter(value < quantile(value, probs = 0.99))
# A tibble: 6 x 2
# Groups: group [2]
# group value
# <fctr> <dbl>
#1 A 60
#2 A 50
#3 A 40
#4 B 3
#5 B 5
#6 B 4
data.table
或使用library(data.table)
setDT(df)[, .(value = value[value < quantile(value, probs = 0.99)]), by = group]
base R
ave