Question

我的数据框如下所示：

df
city   year   wealth
a      2001   1
a      2002   30
b      2001   2
b      2002   20
c      2001   3
c      2002   10

我正在寻找一种简单的方法来根据城市财富对数据框进行子集，而不仅仅是每年的城市。所以我想要输出这样的输出：

top_third
city    year   wealth
a       2002   30
c       2001   3

mid_third
city   year    wealth
b      2001    2
b      2002    20

low_third
city   year    wealth
c      2002    10
a      2001    1

我一直在尝试的方法如下：

top_third <- subset(df, wealth > quantile(wealth, 0.66, na.rm = TRUE))
non_rich  <- subset(df, wealth <=quantile(wealth, 0.66, na.rm = TRUE))
mid_third <- subset(non_rich, wealth > quantile(wealth, 0.5, na.rm = TRUE))
low_third <- subset(non_rich, wealth <=quantile(wealth, 0.5, na.rm = TRUE))

我采用这种方法的最大问题是我无法找到一种方法来计算每年内的分位数。有谁知道一个简单的方法来做到这一点？

Answer 1

这是使用dplyr包的方法。我们按年度对数据进行分组，然后创建一个新列，指示城市所在的组（分位数）。然后，我们可以通过新的组列split向上数据集：

library(dplyr)
df <- df %>% group_by(year) %>%
  mutate(group = cut(wealth, c(-Inf, quantile(wealth, c(1/3, 2/3)), Inf),
                     labels = 1:3))
split(df, df$group)
# $`1`
# Source: local data frame [2 x 4]
# Groups: year [2]

#     city  year wealth  group
#   <fctr> <int>  <int> <fctr>
# 1      a  2001      1      1
# 2      c  2002     10      1

# $`2`
# Source: local data frame [2 x 4]
# Groups: year [2]

#     city  year wealth  group
#   <fctr> <int>  <int> <fctr>
# 1      b  2001      2      2
# 2      b  2002     20      2

# $`3`
# Source: local data frame [2 x 4]
# Groups: year [2]

#     city  year wealth  group
#   <fctr> <int>  <int> <fctr>
# 1      a  2002     30      3
# 2      c  2001      3      3

基于组内分位数子集子集数据帧

1 个答案: