有条件的ntile在tidyverse

时间:2017-10-31 10:03:05

标签: r dplyr tidyverse quantile

我想使用tidyverse仅使用行的子集来计算ntiles。以下基本R代码可以满足我的需求:

基地R:

diamonds$conditional_quartiles_var                           <- NA
diamonds$conditional_quartiles_var[ diamonds$price >= 1000 ] <- ntile( diamonds$price[ diamonds$price >= 1000 ], n = 4 )
diamonds$conditional_quartiles_var[ diamonds$price <  1000 ] <- "Less than 1000"

diamonds %>% count(conditional_quartiles_var)

输出(我想要的):

# A tibble: 5 x 2
  conditional_quartiles_var     n
                      <chr> <int>
1                         1  9861
2                         2  9860
3                         3  9860
4                         4  9860
5            Less than 1000 14499

以上结果是我想要的,因为ntiles仅根据价格&gt; = 1000的值计算。

Tidyverse尝试

我的tidyverse实现失败,因为ntiles是从整个价格向量计算的:

library(tidyverse)


diamonds %>% 
    mutate( wrong_conditional_quartiles_var = case_when(  price >= 1000 ~   ntile(price, n = 4) %>% as.character(),
                                                          price <  1000 ~   "Less than 1000")) %>%
    count( wrong_conditional_quartiles_var)

输出(不是我想要的):

# A tibble: 4 x 2
  wrong_conditional_quartiles_var     n
                            <chr> <int>
1                               2 12471
2                               3 13485
3                               4 13485
4                  Less than 1000 14499

1 个答案:

答案 0 :(得分:1)

我们可以使用replace

library(dplyr)
diamonds %>%
   mutate(quart = "Less than 1000", 
          quart = replace(quart, price >= 1000, ntile(price[price>=1000], 4))) %>%
   count(quart)
# A tibble: 5 x 2
#           quart     n
#           <chr> <int>
#1              1  9861
#2              2  9860
#3              3  9860
#4              4  9860
#5 Less than 1000 14499