我想知道按组(县和年份)温度超过选定阈值(第90、95和98号)的天数。这是我的数据框(heatind)的示例:
dataValue year reportDate geo
53.4 1990 1990-05-01 Billings
59.7 1990 1990-05-01 Missoula
58.8 1991 1990-05-02 Billings
65.9 1991 1990-05-02 Missoula
但是,我的数据集很大(110,799行)。我发现了great tutorial,该如何计算组的分位数。我使用以下代码来做到这一点:
#Percentiles to calculate
p <- c(0.90, 0.95, 0.98)
#Create list of functions
p_names <- map_chr(p, ~paste0(.x*100,"%"))
#Assign names to each function
p_funs <- map(p, ~partial(quantile, probs= .x, na.rm=TRUE)) %>%
set_names(nm = p_names)
#Calculate percentiles by group
percentile <- heatind %>%
group_by(year, geo) %>%
summarize_at(vars(dataValue), funs(!!!p_funs))
百分位数:
year geo dataValue_90% dataValue_95% dataValue_98%
1990 Billings 85.7 86.84 92.69
1990 Missoula 89.26 90.60 92.56
1991 Billings 87.90 89.14 97.06
1991 Missoula 81.43 88.50 91.57
我如何才能更进一步地确定Heatind中超过每个阈值百分数的观察次数?