使用切入R,以便包括不匹配的间隔

时间:2019-04-13 21:51:21

标签: r

我有一个这样的数据集:

sum_col   city    scen    model   time_period   chill_season
110.02     NY      RCP_8   bcc     2076_2099     season_2085_2086
91.26      NY      RCP_8   bcc     2076_2099     season_2086_2087
91.05      NY      RCP_8   bcc     2076_2099     season_2087_2088
74.96      NY      RCP_8   bcc     2076_2099     season_2088_2089
77.97      NY      RCP_8   bcc     2076_2099     season_2089_2090
109.05     NY      RCP_8   bcc     2076_2099     season_2090_2091

我想cut sum_col列并计算多少次,这些值下降 在每个间隔bks = c(-300, seq(20, 75, 5), 300)之内。

但是,当我尝试以下操作时:

result <- dt %>%
          mutate(thresh_range = cut(sum_col, breaks = bks)) %>%
          group_by(time_period, thresh_range, model, scen, city) %>%
          summarize(no_years = n_distinct(chill_season, na.rm = FALSE)) %>% 
          data.table()

我的结果如下:

time_period   thresh_range  model   scen    city   no_years
  2076_2099      (70,75]      bcc   RCP_8     NY     1
  2076_2099     (75,300]      bcc   RCP_8     NY     5

因此,间隔小于70,例如(20, 25), (25, 30)是 未创建(因为在该时间间隔内没有数据行)。

反正有没有告诉cut在这些间隔内返回零?

再次请注意,该行类似于以下内容:

 a_value_leass_than_70_here  NY   RCP_8  bcc 2076_2099  chill_2076_2077

其对应的sum_col小于70的数据不存在,但是,我想知道对于这样不存在的数据是否有可能,cut可以创建一个{{1} }或0告诉我们纽约的温度,而这些参数的确不在NA区间内。

最重要的是,我想知道多少年,每个具有给定参数(20, 25)的城市都落在每个间隔(model, scen, etc)内,

如果还有其他建议(20, 25), (25,30), etc.有效,那也很好。

1 个答案:

答案 0 :(得分:2)

您可以使用complete包中的tidyr函数为丢失的数据组合创建NA行:

library(tidyr)
result <- dt %>%
          mutate(thresh_range = cut(sum_col, breaks = bks)) %>%
          complete(time_period, thresh_range, model, scen, city) %>%
          group_by(time_period, thresh_range, model, scen, city) %>%
          summarize(no_years = n_distinct(chill_season, na.rm = TRUE)) 
result
# # A tibble: 13 x 6
# # Groups:   time_period, thresh_range, model, scen [?]
#    time_period thresh_range model scen  city  no_years
#    <chr>       <fct>        <chr> <chr> <chr>    <int>
#  1 2076_2099   (-300,20]    bcc   RCP_8 NY           0
#  2 2076_2099   (20,25]      bcc   RCP_8 NY           0
#  3 2076_2099   (25,30]      bcc   RCP_8 NY           0
#  4 2076_2099   (30,35]      bcc   RCP_8 NY           0
#  5 2076_2099   (35,40]      bcc   RCP_8 NY           0
#  6 2076_2099   (40,45]      bcc   RCP_8 NY           0
#  7 2076_2099   (45,50]      bcc   RCP_8 NY           0
#  8 2076_2099   (50,55]      bcc   RCP_8 NY           0
#  9 2076_2099   (55,60]      bcc   RCP_8 NY           0
# 10 2076_2099   (60,65]      bcc   RCP_8 NY           0
# 11 2076_2099   (65,70]      bcc   RCP_8 NY           0
# 12 2076_2099   (70,75]      bcc   RCP_8 NY           1
# 13 2076_2099   (75,300]     bcc   RCP_8 NY           5