使用dplyr为多个组应用函数

时间:2018-05-17 14:26:55

标签: r function dplyr tidyr

我有多个位置和年份的数据

big.data <- data.frame(loc.id = rep(1:3, each = 10*3), 
                   year = rep(rep(1981:1983, each = 10),times = 3), 
                   day = rep(1:10, times = 3*3),
                   CN = rep(c(50,55,58), each = 10*3),
                   top.FC = rep(c(72,76,80),each = 10*3),
                   DC = rep(c(0.02,0.5,0.8), each = 10*3),
                   WAT0 = rep(c(20,22,26), each = 10*3),
                   Precp = sample(1:100,90, replace = T),
                   ETo = sample(1:10,90, replace = T)) 

我有一个函数:water.model,它使用内部调用的第二个函数water.update

water.model <- function(dat){

     top.FC  <- unique(dat$top.FC)    

     dat$WAT <- -9.9
     dat$RO <- -9.9
     dat$DR <- -9.9

     dat$WAT[1] <- top.FC/2 # WAT.i is a constant 
     dat$RO[1] <- NA 
     dat$DR[1] <- NA

     for(d in 1:(nrow(dat)-1)){

       dat[d + 1,10:12] <- water.update(WAT0 = dat$WAT[d], 
                                        RAIN.i = dat$Precp[d + 1], 
                                        ETo.i = dat$ETo[d + 1], 
                                        CN = unique(dat$CN), 
                                        DC = unique(dat$DC),
                                        top.FC = unique(dat$top.FC))
     }
     return(dat)
   }



water.update <- function(WAT0, RAIN.i, ETo.i, CN, DC, top.FC){ 

        S = 25400/CN - 254;  IA = 0.2*S

        if (RAIN.i > IA) { RO = (RAIN.i - 0.2 * S)^2/(RAIN.i + 0.8 * S)
          } else { 
            RO = 0 
            }

          if (WAT0 + RAIN.i - RO > top.FC) { 
              DR = DC * (WAT0 + RAIN.i - RO - top.FC) 
              } else { 
              DR = 0 
            }    
        dWAT = RAIN.i - RO - DR - ETo.i
        WAT1 = WAT0 + dWAT
        WAT1 <- ifelse(WAT1 < 0, 0, WAT1) 
        return(list(WAT1,RO,DR))
    } 

如果我为单个位置X年运行上述功能

big.data.sub <- big.data[big.data$loc.id == 1 & big.data$year == 1981,]
water.model(big.data.sub)

   loc.id year day CN top.FC   DC WAT0 Precp ETo      WAT        RO       DR
   1       1 1981   1 50     72 0.02   20    52   5  36.0000        NA       NA
   2       1 1981   2 50     72 0.02   20    12   9  39.0000 0.0000000 0.000000
   3       1 1981   3 50     72 0.02   20     3   2  40.0000 0.0000000 0.000000
   4       1 1981   4 50     72 0.02   20    81   9 107.8750 3.2091485 0.915817
   5       1 1981   5 50     72 0.02   20    37  10 133.4175 0.0000000 1.457501
   6       1 1981   6 50     72 0.02   20    61   7 184.5833 0.3937926 2.440475
   7       1 1981   7 50     72 0.02   20    14  10 186.0516 0.0000000 2.531665
   8       1 1981   8 50     72 0.02   20     9   6 186.5906 0.0000000 2.461032
   9       1 1981   9 50     72 0.02   20    77   9 248.3579 2.4498216 3.782815
   10      1 1981  10 50     72 0.02   20    18   6 256.4708 0.0000000 3.887159

如何在所有地点和年份运行此操作?

big.data %>% group_by(loc.id, year) %>% # apply my function here.

我的最终数据应如上所示,其中包含三个名为WATRODR的新列,这些列是在运行函数时生成的。

1 个答案:

答案 0 :(得分:2)

我们可以通过使用water.model

循环list来拆分数据并应用map
library(tidyverse)
split(big.data, big.data[c('loc.id', 'year')], drop = TRUE) %>% 
           map_df(water.model)

或者在do

之后的group_by内应用此功能
big.data %>%
   group_by(loc.id, year) %>%
   do(data.frame(water.model(.)))