我正在尝试聚合取决于多个列的数据,并且聚合是另一列的函数。以下是我的数据集的示例:
STATE NAICS ENTRSIZE FIRM
0 11 2 14869
0 11 3 3472
0 11 4 1656
0 11 6 1119
0 11 9 84
0 21 2 12623
0 21 3 3203
1 11 2 14869
1 11 7 54
1 11 9 12
我想实现的是,对于每个STATE NAICS对,其FIRM值取决于ENTRSIZE值。基本上,我想将所有ENTRSIZE的2、3、4、6汇总为“小”,将7汇总为“中”,将9汇总为“大”,因此我的最终数据应类似于:
STATE NAICS ENTRSIZE FIRM
0 11 SMALL 21116
0 11 LARGE 84
0 21 SMALL 15826
1 11 SMALL 14869
1 11 MEDIUM 54
1 11 LARGE 12
答案 0 :(得分:0)
这项工作:
> df %>% mutate(ENTRSIZE = case_when( ENTRSIZE %in% 2:6 ~ 'SMALL', ENTRSIZE == 7 ~ 'MEDIUM', TRUE ~ 'LARGE')) %>% group_by(STATE, NAICS, ENTRSIZE) %>% summarise(FIRM = sum(FIRM))
`summarise()` regrouping output by 'STATE', 'NAICS' (override with `.groups` argument)
# A tibble: 6 x 4
# Groups: STATE, NAICS [3]
STATE NAICS ENTRSIZE FIRM
<dbl> <dbl> <chr> <dbl>
1 0 11 LARGE 84
2 0 11 SMALL 21116
3 0 21 SMALL 15826
4 1 11 LARGE 12
5 1 11 MEDIUM 54
6 1 11 SMALL 14869