group_by并计算R中满足条件的行数

时间:2019-03-01 15:33:45

标签: r group-by conditional-statements

我有一个如下数据表:

city         year    temp
Seattle      2019    82 
Seattle      2018    10 
NYC          2010    78 
DC           2011    71 
DC           2011    10 
DC           2018    60 

我想按cityyear对它们进行分组,然后根据该表创建一个新表 例如,这表明西雅图有多少年温度在10到20之间,它有多少年温度在20到30之间,依此类推。

我该怎么做?

1 个答案:

答案 0 :(得分:1)

我们可以使用cuttemp分发到垃圾箱中,并按citytemp_range进行汇总:

library(dplyr)

df %>%
  mutate(temp_range = cut(temp, breaks = seq(0, 100, 10))) %>%
  group_by(city, temp_range) %>%
  summarize(years = n_distinct(year))

输出:

# A tibble: 6 x 3
# Groups:   city [3]
  city    temp_range years
  <fct>   <fct>      <int>
1 DC      (0,10]         1
2 DC      (50,60]        1
3 DC      (70,80]        1
4 NYC     (70,80]        1
5 Seattle (0,10]         1
6 Seattle (80,90]        1

通过dplyr 0.8.0,我们还可以通过在.drop中将新的FALSE参数设置为group_by来保持空因子水平:

df %>%
  mutate(temp_range = cut(temp, breaks = seq(0, 100, 10))) %>%
  group_by(city, temp_range, .drop = FALSE) %>%
  summarize(years = n_distinct(year))

输出:

# A tibble: 30 x 3
# Groups:   city [3]
   city  temp_range years
   <fct> <fct>      <int>
 1 DC    (0,10]         1
 2 DC    (10,20]        0
 3 DC    (20,30]        0
 4 DC    (30,40]        0
 5 DC    (40,50]        0
 6 DC    (50,60]        1
 7 DC    (60,70]        0
 8 DC    (70,80]        1
 9 DC    (80,90]        0
10 DC    (90,100]       0
# ... with 20 more rows

数据:

df <- structure(list(city = structure(c(3L, 3L, 2L, 1L, 1L, 1L), .Label = c("DC", 
"NYC", "Seattle"), class = "factor"), year = c(2019L, 2018L, 
2010L, 2011L, 2011L, 2018L), temp = c(82L, 10L, 78L, 71L, 10L, 
60L)), class = "data.frame", row.names = c(NA, -6L))