用于计算多个条件的cumsum的函数

时间:2018-04-07 21:32:09

标签: r dplyr data.table tidyverse cumsum

我有一个数据框,如下所示:

set.seed(123)
df <- data.frame(loc.id = rep(1:10, each = 101*10), 
             year = rep(rep(2001:2010, each = 101), times = 10),
             day = rep(rep(250:350, times = 10), times = 10),
             ref.rain = rep(c(400,500,450,430,470,576,644,230,850,690), each = 10*101),
             rain = runif(min = 0, max = 20, 10*101*10))

数据框包含10个位置的数据。对于每个地点,我有从doy 250到2001年至2010年的doy 350的降雨量数据。ref.rain是每个地点的参考降雨量,对于一个地点的所有年份都是相同的,但是对于10个地点中的每个地点都是不同的。

对于每个地点和每年,我想确定累积降雨量达到1%,2%,3%的天数(从250起)....参考降雨量的5%那个位置。这就是我所做的

# define a function which does the job 

my.fun <- function(x,y){ifelse(sum(cumsum(x) >= y) == 0, NA, which.max(cumsum(x) >= y))} 

df1 <- data.table(df %>% group_by(loc.id,year) %>% 
            mutate(rain.01 = ref.rain*0.01, # calculate 1% of the ref.rain
                   rain.02 = ref.rain*0.02,
                   rain.03 = ref.rain*0.03,
                   rain.04 = ref.rain*0.04,
                   rain.05 = ref.rain*0.05) %>% 
            summarise(days2rain01 = my.fun(rain,rain.01), # apply the function that gives the no. of days to reach 1% 
                      days2rain02 = my.fun(rain,rain.02),
                      days2rain03 = my.fun(rain,rain.03),
                      days2rain04 = my.fun(rain,rain.04),
                      days2rain05 = my.fun(rain,rain.05)))

我的问题是我希望my.fun足够灵活,以便我可以计算不。任何%降雨量的天数(1%,2%,3%,....... 50%)。目前,如果我想计算更多的百分比,我必须添加一个额外的rain.XX = ref.rain*XX参数,然后再添加一个days2rainXX = my.fun(rain,rain.XX)参数。如何编写函数以便它采用百分比向量并产生结果。

1 个答案:

答案 0 :(得分:1)

library(dplyr)
# Create vector of percents
pct <- seq(0.01, 0.05, 0.01)
# Create reference rainfall columns 
df[paste0('rain', pct)] <- lapply(pct, `*`, df$ref.rain)
# summarise at new columns, with grouping
df %>% 
    group_by(loc.id, year) %>%  
    summarise_at(paste0('rain', pct), my.fun, x = as.name('rain'))

我不确定这是否更快或更清晰,但你的功能也可能是

myfun <- function(x, y) which(cumsum(x) >= y)[1]