我有一个如下数据表:
city year temp
Seattle 2019 82
Seattle 2018 10
NYC 2010 78
DC 2011 71
DC 2011 10
DC 2018 60
我想按city
和year
对它们进行分组,然后根据该表创建一个新表
例如,这表明西雅图有多少年温度在10到20之间,它有多少年温度在20到30之间,依此类推。
我该怎么做?
答案 0 :(得分:1)
我们可以使用cut
将temp
分发到垃圾箱中,并按city
和temp_range
进行汇总:
library(dplyr)
df %>%
mutate(temp_range = cut(temp, breaks = seq(0, 100, 10))) %>%
group_by(city, temp_range) %>%
summarize(years = n_distinct(year))
输出:
# A tibble: 6 x 3
# Groups: city [3]
city temp_range years
<fct> <fct> <int>
1 DC (0,10] 1
2 DC (50,60] 1
3 DC (70,80] 1
4 NYC (70,80] 1
5 Seattle (0,10] 1
6 Seattle (80,90] 1
通过dplyr 0.8.0
,我们还可以通过在.drop
中将新的FALSE
参数设置为group_by
来保持空因子水平:
df %>%
mutate(temp_range = cut(temp, breaks = seq(0, 100, 10))) %>%
group_by(city, temp_range, .drop = FALSE) %>%
summarize(years = n_distinct(year))
输出:
# A tibble: 30 x 3
# Groups: city [3]
city temp_range years
<fct> <fct> <int>
1 DC (0,10] 1
2 DC (10,20] 0
3 DC (20,30] 0
4 DC (30,40] 0
5 DC (40,50] 0
6 DC (50,60] 1
7 DC (60,70] 0
8 DC (70,80] 1
9 DC (80,90] 0
10 DC (90,100] 0
# ... with 20 more rows
数据:
df <- structure(list(city = structure(c(3L, 3L, 2L, 1L, 1L, 1L), .Label = c("DC",
"NYC", "Seattle"), class = "factor"), year = c(2019L, 2018L,
2010L, 2011L, 2011L, 2018L), temp = c(82L, 10L, 78L, 71L, 10L,
60L)), class = "data.frame", row.names = c(NA, -6L))