迭代地为矢量生成所有不同组合的结果

时间:2018-04-19 16:03:11

标签: r dplyr data.table apply

我的数据

puts [string map {( { } ) {} } $line]

此数据有5个位置。对于每个位置,我有3天的80天降雨量数据以及指定该位置的平均总降雨量的列

我有一个百分比载体

set.seed(123)
df <- data.frame(loc.id = rep(1:5, each = 80*3), year = rep(rep(2000:2002, each = 80), times = 5), 
             daily.rain = runif(min = 0, max = 30, 80*5*3))
df.clim <- df %>% group_by(loc.id,year) %>% summarise(tot.rain = sum(daily.rain)) %>% ungroup() %>% group_by(loc.id) %>%
      summarise(mean.rain = mean(tot.rain))
df <- df %>% left_join(df.clim)

我想要做的是:对于每个地点和年份,我想要计算不。达到目标所需的日子:

per.vec <- seq(from = 1, to = 59, by = 2)

我的最终数据应如下所示:

  from 1% to 3% of mean.rain
  from 1% to 5% of mean.rain,
  .
  .
  from 1% to 59% of mean.rain,
  from 2% to 3% of mean.rain
  from 2% to 5% of mean.rain
   .
   .
  from 2% to 59% of mean.rain
   .
   .
  from 57% to 59% of mean.rain

这是我在R中从未做过的事情,所以想知道做什么可行的方法

1 个答案:

答案 0 :(得分:0)

好的,这是一个包含871列的数据框,因此嵌套for循环需要一段时间。我确信有更好的方法可以动态生成列。

df <- data.frame(loc.id = rep(1:5, each = 80*3), year = rep(rep(2000:2002, each = 80), times = 5), 
             daily.rain = runif(min = 0, max = 30, 80*5*3))
df.clim <- df %>% group_by(loc.id,year) %>% summarise(tot.rain = sum(daily.rain)) %>% ungroup() %>% group_by(loc.id) %>%
  summarise(mean.rain = mean(tot.rain))
df <- df %>% left_join(df.clim)

首先,您可以使用group_by生成每年下雨的运行总和,并在每个位置ID处添加一行mutate,每行cumsum。< / p>

#get daily rain total
df <- df %>% group_by(loc.id,year) %>% mutate(totalrain = cumsum(daily.rain))

接下来,我们可以选择要迭代的两个向量。在这种情况下,你的内部向量是3到59乘2,你的外面是1到57。

per.vec <- seq(from = 3, to = 59, by = 2)
per.vec2 <- 1:57

然后我们遍历这组数字并每次使用mutate函数生成一个新列,进行比较以查看当天的降雨总量是否介于这两个百分比之间。

#get comparison for each level
for (x in per.vec2) {
  for (y in per.vec) {
    if(x<y) {
      df <- df %>% group_by(loc.id,year) %>% mutate(!!paste(x," - ",y,"% of mean",collapse = "") := (totalrain >= mean.rain*(x/100)) & (totalrain <= mean.rain*(y/100)))

}}}

最后,我们将dplyr的汇总功能与group_by结合使用,在每个列上进行总和,以计算总降雨量在百分比之间的天数。

#get the total number of days by summing
results <- df[,-(3:5)] %>% group_by(loc.id,year) %>%summarise_all(funs(sum))
head(results[,1:7])
# A tibble: 6 x 7
# Groups: loc.id [2]
#   loc.id  year `1  -  3 % of mean` `1  -  5 % of mean` `1  -  7 % of mean` `1  -  9 % of mean` `1  -  11 % of mean`
#    <int> <int>               <int>               <int>               <int>               <int>                <int>
# 1      1  2000                   3                   4                   6                   8                    8
# 2      1  2001                   1                   2                   3                   5                    7
# 3      1  2002                   1                   2                   3                   6                    7
# 4      2  2000                   2                   3                   3                   4                    5
# 5      2  2001                   3                   4                   5                   7                    9
# 6      2  2002                   1                   3                   4                   6                    7