我想使用tidyverse仅使用行的子集来计算ntiles。以下基本R代码可以满足我的需求:
基地R:
diamonds$conditional_quartiles_var <- NA
diamonds$conditional_quartiles_var[ diamonds$price >= 1000 ] <- ntile( diamonds$price[ diamonds$price >= 1000 ], n = 4 )
diamonds$conditional_quartiles_var[ diamonds$price < 1000 ] <- "Less than 1000"
diamonds %>% count(conditional_quartiles_var)
输出(我想要的):
# A tibble: 5 x 2
conditional_quartiles_var n
<chr> <int>
1 1 9861
2 2 9860
3 3 9860
4 4 9860
5 Less than 1000 14499
以上结果是我想要的,因为ntiles仅根据价格&gt; = 1000的值计算。
Tidyverse尝试
我的tidyverse实现失败,因为ntiles是从整个价格向量计算的:
library(tidyverse)
diamonds %>%
mutate( wrong_conditional_quartiles_var = case_when( price >= 1000 ~ ntile(price, n = 4) %>% as.character(),
price < 1000 ~ "Less than 1000")) %>%
count( wrong_conditional_quartiles_var)
输出(不是我想要的):
# A tibble: 4 x 2
wrong_conditional_quartiles_var n
<chr> <int>
1 2 12471
2 3 13485
3 4 13485
4 Less than 1000 14499
答案 0 :(得分:1)
我们可以使用replace
library(dplyr)
diamonds %>%
mutate(quart = "Less than 1000",
quart = replace(quart, price >= 1000, ntile(price[price>=1000], 4))) %>%
count(quart)
# A tibble: 5 x 2
# quart n
# <chr> <int>
#1 1 9861
#2 2 9860
#3 3 9860
#4 4 9860
#5 Less than 1000 14499