我的数据
puts [string map {( { } ) {} } $line]
此数据有5个位置。对于每个位置,我有3天的80天降雨量数据以及指定该位置的平均总降雨量的列
我有一个百分比载体
set.seed(123)
df <- data.frame(loc.id = rep(1:5, each = 80*3), year = rep(rep(2000:2002, each = 80), times = 5),
daily.rain = runif(min = 0, max = 30, 80*5*3))
df.clim <- df %>% group_by(loc.id,year) %>% summarise(tot.rain = sum(daily.rain)) %>% ungroup() %>% group_by(loc.id) %>%
summarise(mean.rain = mean(tot.rain))
df <- df %>% left_join(df.clim)
我想要做的是:对于每个地点和年份,我想要计算不。达到目标所需的日子:
per.vec <- seq(from = 1, to = 59, by = 2)
我的最终数据应如下所示:
from 1% to 3% of mean.rain
from 1% to 5% of mean.rain,
.
.
from 1% to 59% of mean.rain,
from 2% to 3% of mean.rain
from 2% to 5% of mean.rain
.
.
from 2% to 59% of mean.rain
.
.
from 57% to 59% of mean.rain
这是我在R中从未做过的事情,所以想知道做什么可行的方法
答案 0 :(得分:0)
好的,这是一个包含871列的数据框,因此嵌套for循环需要一段时间。我确信有更好的方法可以动态生成列。
df <- data.frame(loc.id = rep(1:5, each = 80*3), year = rep(rep(2000:2002, each = 80), times = 5),
daily.rain = runif(min = 0, max = 30, 80*5*3))
df.clim <- df %>% group_by(loc.id,year) %>% summarise(tot.rain = sum(daily.rain)) %>% ungroup() %>% group_by(loc.id) %>%
summarise(mean.rain = mean(tot.rain))
df <- df %>% left_join(df.clim)
首先,您可以使用group_by
生成每年下雨的运行总和,并在每个位置ID处添加一行mutate
,每行cumsum
。< / p>
#get daily rain total
df <- df %>% group_by(loc.id,year) %>% mutate(totalrain = cumsum(daily.rain))
接下来,我们可以选择要迭代的两个向量。在这种情况下,你的内部向量是3到59乘2,你的外面是1到57。
per.vec <- seq(from = 3, to = 59, by = 2)
per.vec2 <- 1:57
然后我们遍历这组数字并每次使用mutate
函数生成一个新列,进行比较以查看当天的降雨总量是否介于这两个百分比之间。
#get comparison for each level
for (x in per.vec2) {
for (y in per.vec) {
if(x<y) {
df <- df %>% group_by(loc.id,year) %>% mutate(!!paste(x," - ",y,"% of mean",collapse = "") := (totalrain >= mean.rain*(x/100)) & (totalrain <= mean.rain*(y/100)))
}}}
最后,我们将dplyr
的汇总功能与group_by
结合使用,在每个列上进行总和,以计算总降雨量在百分比之间的天数。
#get the total number of days by summing
results <- df[,-(3:5)] %>% group_by(loc.id,year) %>%summarise_all(funs(sum))
head(results[,1:7])
# A tibble: 6 x 7
# Groups: loc.id [2]
# loc.id year `1 - 3 % of mean` `1 - 5 % of mean` `1 - 7 % of mean` `1 - 9 % of mean` `1 - 11 % of mean`
# <int> <int> <int> <int> <int> <int> <int>
# 1 1 2000 3 4 6 8 8
# 2 1 2001 1 2 3 5 7
# 3 1 2002 1 2 3 6 7
# 4 2 2000 2 3 3 4 5
# 5 2 2001 3 4 5 7 9
# 6 2 2002 1 3 4 6 7