计算R中加权样本中子组的分位数

时间:2016-07-17 16:04:37

标签: r

我在R中加权了样本data_frame:

ID  GROUP1 GROUP2   A       weight
1   A      1        25      100     
2   B      1        31      120     
3   C      1        21      70      
4   A      2        55      63      
5   C      2        8       80      
6   C      2        41      80      
7   B      1        45      120     
8   A      2        23      63      

我想为每个子组(GROUP1和GROUP2的组合)计算 A 变量的第5个百分点,并将此值分配给每个人(新列=" demanded_column")。我想要这样的东西,但也包括样品重量:

data_frame$demanded_column<-ave(A, c(GROUP1, GROUP2), FUN = function (x) quantile (x, q=0.05, na.rm = TRUE ))

2 个答案:

答案 0 :(得分:0)

这个怎么样?我使用splitHmisc::wtd.quantile来计算每个子组的5%分位数,然后使用unsplit将结果广播回原始维度:

df <- read.table("clipboard", header=TRUE)


v <- lapply(split(df, df[2:3], drop=TRUE), function(x) {
  Hmisc::wtd.quantile(x$A, x$weight, probs = 0.05, na.rm = TRUE)
})

df$q05 <- unsplit(v, df[2:3], drop = TRUE)

结果:

> df
  ID GROUP1 GROUP2  A weight q05
1  1      A      1 25    100  25
2  2      B      1 31    120  31
3  3      C      1 21     70  21
4  4      A      2 55     63  23
5  5      C      2  8     80   8
6  6      C      2 41     80   8
7  7      B      1 45    120  31
8  8      A      2 23     63  23

答案 1 :(得分:0)

您可以结合使用dplyrmagrittr

library(dplyr) ## Importing dplyr will import the %>% operator from magrittr

dframe <- structure(list(ID = 1:8, GROUP1 = structure(c(1L, 2L, 3L, 1L, 
3L, 3L, 2L, 1L), .Label = c("A", "B", "C"), class = "factor"), 
    GROUP2 = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L), A = c(25L, 31L, 
    21L, 55L, 8L, 41L, 45L, 23L), weight = c(100L, 120L, 70L, 
    63L, 80L, 80L, 120L, 63L)), .Names = c("ID", "GROUP1", "GROUP2", 
"A", "weight"), class = "data.frame", row.names = c(NA, -8L))

new_dframe <- dframe %>% group_by(GROUP1, GROUP2, weight) 
                     %>% mutate(demanded_column = quantile(A,q=0.05)[[1]])

new_dframe

ID GROUP1 GROUP2  A weight demanded_column
  1      A      1 25    100              25
  2      B      1 31    120              31
  3      C      1 21     70              21
  4      A      2 55     63              23
  5      C      2  8     80               8
  6      C      2 41     80               8
  7      B      1 45    120              31
  8      A      2 23     63              23

我希望这会有所帮助。