Question

我有一个包含几列的数据表。可以说

Location，其中可能包含Los Angles等。

age_Group，例如(young，child，teenager)等。

year = (2000, 2001, ..., 2015)

month = c(jan, ..., dec) 我想group_by他们，看看有多少人花了钱在某些间隔中，假设我的间隔为interval_1 = (1, 100)，(100, 1000)，...，interval_20=(1000, infinity)

我应该如何进行？完成以下操作后该怎么办？

data %>% group_by(location, age_Group, year, month)

样本：

location age_gp  year month   spending
LA       child   2000   1         102
LA       teen    2000   1         15
LA       teen    2000   10        9
NY       old     2000   11        1000
NY       old     2010   2         1000000
NY       teen    2020   3         10

所需的输出

LA, child, 2000, jan  interval_1
LA, child, 2000, feb  interval_20
...
NY  OLD    2015   Dec  interval_1

最后一列必须通过添加属于同一城市的所有人的支出，age_croup，年，月来确定。

Answer 1

您可以首先使用spending_cat函数创建一个新列（cut）。在您可以将新变量添加为分组变量之后，只需计数：

df <- data.frame(group = sample(letters[1:4], size = 1000, replace = T),
                 spending = rnorm(1000))

df %>% 
  mutate(spending_cat = cut(spending, breaks = c(-5:5))) %>%
  group_by(group, spending_cat) %>%
  summarise(n_people = n())

# A tibble: 26 x 3
# Groups:   group [?]
   group spending_cat n_people
   <fct> <fct>           <int>
 1 a     (-3,-2]             6
 2 a     (-2,-1]            36
 3 a     (-1,0]             83
 4 a     (0,1]              78
 5 a     (1,2]              23
 6 a     (2,3]              10
 7 b     (-4,-3]             1
 8 b     (-3,-2]             4
 9 b     (-2,-1]            40
10 b     (-1,0]             78
# … with 16 more rows

拆分合并R

1 个答案: