我有一个像这样的数据框:
set.seed(567)
year= as.factor(c(rep("1998", 20), rep("1999", 16)))
lepsp= c(letters[seq(from = 1, to = 20 )], c('a','b','c'),letters[seq(from =8, to = 20 )])
freq= rpois(36, lambda=12)
df<-data.frame(year, lepsp, freq)
df<-
df %>%
group_by(year) %>%
mutate(rank = dense_rank(-freq))
我希望按df
对year
进行分组,然后创建一个名为quant
的新列,将相应的四分位数分配给子集中的每个freq
值。新列可以将分位数指定为probs = seq(0, 1, 0.05)
。最重要的是,我后来能够根据分位数分配类别,例如,低于25%的任何东西都被归类为罕见。所以这些可以是宽四分位数的指定,但百分位增量越小,摆动空间就越大。我会将某些内容归类为罕见r
或常见c
。
输出应如下所示:
df<-data.frame(df, quant= c(75,50,25,50,50,25,75,50,25,75,75,100,50,100,100,50,25,25,75,25,75,50,50,75,75,25,25,50,50,50,25,75,75,25,75,50),
abucat= c("c", "r", "r","r","r", "r","c","r","r", "c", "c", "c", "r","c", "c","r" , "r", "r", "c", "r", "c","r","r","c","c","r",
"r","r","r","r","r","c","c","r","c","r"))
我试过了:
library(dplyr)
df<-
df %>%
group_by(year) %>%
mutate(quant = quantile(freq, probs= seq(0, 1, 0.25)))
答案 0 :(得分:0)
我更新了代码以使用case_when
来使其更直观。您应该能够看到quant
被分类的每个案例以及相应的值。然后,我将tidyr
分开使其成为2列。
library(dplyr)
library(tidyr)
set.seed(567)
year= as.factor(c(rep("1998", 20), rep("1999", 16)))
lepsp= c(letters[seq(from = 1, to = 20 )], c('a','b','c'),letters[seq(from =8, to = 20 )])
freq= rpois(36, lambda=12)
df<-data.frame(year, lepsp, freq)
df<-
df %>%
group_by(year) %>%
mutate(rank = dense_rank(-freq))
df<-data.frame(df, quant= c(75,50,25,50,50,25,75,50,25,75,75,100,50,100,100,50,25,25,75,25,75,50,50,75,75,25,25,50,50,50,25,75,75,25,75,50),
abucat= c("c", "r", "r","r","r", "r","c","r","r", "c", "c", "c", "r","c", "c","r" , "r", "r", "c", "r", "c","r","r","c","c","r",
"r","r","r","r","r","c","c","r","c","r"))
df %>%
group_by(year) %>%
mutate(qtile = list(quantile(freq))) %>%
rowwise() %>%
mutate(q = case_when(freq <= qtile[2] ~ "25,r",
freq > qtile[2] & freq <=qtile[3] ~"50,r",
freq > qtile[3] & freq <=qtile[4] ~"75,c",
freq > qtile[4] ~ "100,c")) %>%
separate(q, c("quant","abucat")) %>%
select(-qtile)
# Source: local data frame [36 x 6]
# Groups: <by row>
#
# # A tibble: 36 x 6
# year lepsp freq rank quant abucat
# <fct> <fct> <int> <int> <chr> <chr>
# 1 1998 a 14 3 75 c
# 2 1998 b 13 4 50 r
# 3 1998 c 9 7 25 r
# 4 1998 d 12 5 50 r
# 5 1998 e 12 5 50 r
# 6 1998 f 9 7 25 r
# 7 1998 g 15 2 75 c
# 8 1998 h 12 5 50 r
# 9 1998 i 10 6 25 r
# 10 1998 j 15 2 75 c
# # ... with 26 more rows