我在R中加权了样本data_frame:
ID GROUP1 GROUP2 A weight
1 A 1 25 100
2 B 1 31 120
3 C 1 21 70
4 A 2 55 63
5 C 2 8 80
6 C 2 41 80
7 B 1 45 120
8 A 2 23 63
我想为每个子组(GROUP1和GROUP2的组合)计算 A 变量的第5个百分点,并将此值分配给每个人(新列=" demanded_column")。我想要这样的东西,但也包括样品重量:
data_frame$demanded_column<-ave(A, c(GROUP1, GROUP2), FUN = function (x) quantile (x, q=0.05, na.rm = TRUE ))
答案 0 :(得分:0)
这个怎么样?我使用split
和Hmisc::wtd.quantile
来计算每个子组的5%分位数,然后使用unsplit
将结果广播回原始维度:
df <- read.table("clipboard", header=TRUE)
v <- lapply(split(df, df[2:3], drop=TRUE), function(x) {
Hmisc::wtd.quantile(x$A, x$weight, probs = 0.05, na.rm = TRUE)
})
df$q05 <- unsplit(v, df[2:3], drop = TRUE)
结果:
> df
ID GROUP1 GROUP2 A weight q05
1 1 A 1 25 100 25
2 2 B 1 31 120 31
3 3 C 1 21 70 21
4 4 A 2 55 63 23
5 5 C 2 8 80 8
6 6 C 2 41 80 8
7 7 B 1 45 120 31
8 8 A 2 23 63 23
答案 1 :(得分:0)
您可以结合使用dplyr
和magrittr
:
library(dplyr) ## Importing dplyr will import the %>% operator from magrittr
dframe <- structure(list(ID = 1:8, GROUP1 = structure(c(1L, 2L, 3L, 1L,
3L, 3L, 2L, 1L), .Label = c("A", "B", "C"), class = "factor"),
GROUP2 = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L), A = c(25L, 31L,
21L, 55L, 8L, 41L, 45L, 23L), weight = c(100L, 120L, 70L,
63L, 80L, 80L, 120L, 63L)), .Names = c("ID", "GROUP1", "GROUP2",
"A", "weight"), class = "data.frame", row.names = c(NA, -8L))
new_dframe <- dframe %>% group_by(GROUP1, GROUP2, weight)
%>% mutate(demanded_column = quantile(A,q=0.05)[[1]])
new_dframe
ID GROUP1 GROUP2 A weight demanded_column
1 A 1 25 100 25
2 B 1 31 120 31
3 C 1 21 70 21
4 A 2 55 63 23
5 C 2 8 80 8
6 C 2 41 80 8
7 B 1 45 120 31
8 A 2 23 63 23
我希望这会有所帮助。