我的问题是我试图找到按季节(DJF,MAM,JJA,SON)和年份(1926 - 2000)的累积降雨量,每个季节结束时总和重置为零。
我已经设法使用代码
逐年完成rainfall$yearly.cumsum=unlist(tapply(rainfall$RR, rainfall$year, FUN=cumsum))
并尝试使用
来适应季节rainfall$seasonal.cumsum=unlist(tapply(rainfall$RR, .(season,year), transform, FUN=cumsum))
这会返回错误
Error in unique.default(x, nmax = nmax) :
unique() applies only to vectors
我也试过这个:
rainfall$seasonal.cumsum=unlist(tapply(rainfall$RR, rainfall$season, FUN=cumsum))
这更有希望,因为它确实按季节增加,但在季节变化时不会重置。也就是说,我认为代码是每年每年总结DJF,然后每年进入MAM,然后是JJA,最后是SON,而不是DJF一年,重置,MAM为同年,重置等。
这是数据框的一部分。请注意,annual.cumsum正在对RR列中的值求和,但不包括seasonal.cumsum。
DATE year month season RR yearly.cumsum seasonal.cumsum
19260529 1926 05 MAM 0 2347 2518
19260530 1926 05 MAM 0 2347 2518
19260531 1926 05 MAM 9 2356 2530
19260601 1926 06 JJA 0 2356 2530
19260602 1926 06 JJA 3 2359 2530
19260603 1926 06 JJA 71 2430 2530
19260604 1926 06 JJA 0 2430 2530
19260605 1926 06 JJA 48 2478 2534
我希望我的问题足够明确!
感谢。
答案 0 :(得分:2)
您可以尝试dplyr
library(dplyr)
rainfall %>%
group_by(season, year) %>%
mutate(seasonal.cumsum=cumsum(RR))
# DATE year month season RR yearly.cumsum seasonal.cumsum
#1 19260529 1926 5 MAM 0 2347 0
#2 19260530 1926 5 MAM 0 2347 0
#3 19260531 1926 5 MAM 9 2356 9
#4 19260601 1926 6 JJA 0 2356 0
#5 19260602 1926 6 JJA 3 2359 3
#6 19260603 1926 6 JJA 71 2430 74
#7 19260604 1926 6 JJA 0 2430 74
#8 19260605 1926 6 JJA 48 2478 122
关于创建跨越一年的连续月份,您可以尝试这一点(此处,此重置在3月1日,开始新的一年)
indx <- rainfall2$year-min(rainfall2$year) + rainfall2$month %in% c(1,2,12)
indx1 <- cumsum(c(TRUE,diff(indx) <0))
rainfall2$year2 <- indx1+ (min(rainfall$year))
res <- rainfall2 %>%
group_by(season, year2) %>%
mutate(seasonal.cumsum=cumsum(RR))
do.call(rbind,lapply(split(res, res$year2), head,2))
# DATE month year season RR year2 seasonal.cumsum
#1 19260504 5 1926 MAM 50 1927 50
#2 19260505 5 1926 MAM 84 1927 134
#3 19270301 3 1927 MAM 98 1928 98
#4 19270302 3 1927 MAM 112 1928 210
#5 19280301 3 1928 MAM 91 1929 91
#6 19280302 3 1928 MAM 85 1929 176
#7 19290301 3 1929 MAM 18 1930 18
#8 19290302 3 1929 MAM 111 1930 129
如果您需要在12月1日重置年份
indx <- rainfall2$year-min(rainfall2$year) + !rainfall2$month %in% c(1,2,12)
indx1 <- cumsum(c(TRUE,diff(indx) <0))
rainfall2$year2 <- indx1+ (min(rainfall2$year)-1)
res2 <- rainfall2 %>%
group_by(season, year2) %>%
mutate(seasonal.cumsum=cumsum(RR))
do.call(rbind,lapply(split(res2, res2$year2), head,2))
# DATE month year season RR year2 seasonal.cumsum
#1 19260504 5 1926 MAM 50 1926 50
#2 19260505 5 1926 MAM 84 1926 134
#3 19261201 12 1926 DJF 120 1927 120
#4 19261202 12 1926 DJF 26 1927 146
#5 19271201 12 1927 DJF 112 1928 112
#6 19271202 12 1927 DJF 78 1928 190
#7 19281201 12 1928 DJF 96 1929 96
#8 19281202 12 1928 DJF 26 1929 122
我认为最好创建一个小数据集以便更好地理解
set.seed(24)
df <- data.frame(month=rep(rep(1:12,each=4),3), year=rep(1926:1928, each=12*4))
首先,我们正在使用c(1,2,12)
检查df$month
列中的%in%
列中的哪个月TRUE
。它返回一个逻辑向量,1
表示2
,12
或!
的元素。通过使用否定TRUE
,我们尝试将FALSE
设为1
,反之亦然。这意味着,我们在这里寻找的数月不是2
,12
或head(!df$month %in% c(1,2,12), 15)
# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
#[13] TRUE TRUE TRUE
year
接下来,我们从数据集中的minimum
年减去df$year-min(df$year)
#[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#[38] 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#[75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
#[112] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
来获取值
TRUE/FALSE
如果我们添加上述两个,第一个1/0
将强制转换为整数( indx <- df$year-min(df$year) + !df$month %in% c(1,2,12)
indx
#[1] 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#[38] 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
#[75] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3
#[112] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2
),我们得到
diff
在第二步中,我们首先执行indx
或indx
的相邻元素之间的差异,这将返回一个元素少于c(TRUE,..)
长度的向量。然后检查它返回值的位置&lt; 0.为了使长度相等,我们可以使用 head(diff(indx),55)
#[1] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#[26] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 1 0 0
#[51] 0 0 0 0 0
head(c(TRUE,diff(indx) <0), 55)
#[1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#[37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
#[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
head(cumsum(c(TRUE,diff(indx) <0)), 55)
#[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#[39] 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
indx1 <- cumsum(c(TRUE, diff(indx) <0))
indx1
从上一步开始,我们得到year
,然后我们添加最小 head( indx1+ (min(df$year)),55)
#[1] 1927 1927 1927 1927 1927 1927 1927 1927 1927 1927 1927 1927 1927 1927 1927
#[16] 1927 1927 1927 1927 1927 1927 1927 1927 1927 1927 1927 1927 1927 1927 1927
#[31] 1927 1927 1927 1927 1927 1927 1927 1927 1927 1927 1927 1927 1927 1927 1928
#[46] 1928 1928 1928 1928 1928 1928 1928 1928 1928 1928
indx2 <- indx1+ (min(df$year))
split(df, indx2) #to check the results
rainfall <- structure(list(DATE = c(19260529L, 19260530L, 19260531L, 19260601L,
19260602L, 19260603L, 19260604L, 19260605L), year = c(1926L,
1926L, 1926L, 1926L, 1926L, 1926L, 1926L, 1926L), month = c(5L,
5L, 5L, 6L, 6L, 6L, 6L, 6L), season = c("MAM", "MAM", "MAM",
"JJA", "JJA", "JJA", "JJA", "JJA"), RR = c(0L, 0L, 9L, 0L, 3L,
71L, 0L, 48L), yearly.cumsum = c(2347L, 2347L, 2356L, 2356L,
2359L, 2430L, 2430L, 2478L), seasonal.cumsum = c(2518L, 2518L,
2530L, 2530L, 2530L, 2530L, 2530L, 2534L)), .Names = c("DATE",
"year", "month", "season", "RR", "yearly.cumsum", "seasonal.cumsum"
), class = "data.frame", row.names = c(NA, -8L))
DATE= format(seq(as.Date("1926-05-04"), length.out=1200, by='1 day'), '%Y%m%d')
month <- as.numeric(substr(DATE,5,6))
year <- as.numeric(substr(DATE,1,4))
season <- ifelse(month %in% c(12,1,2), 'DJF',
ifelse(month %in% 3:5, 'MAM', ifelse(month %in% 6:8, 'JJA','SON')))
set.seed(25)
RR <- sample(0:120, 1200, replace=TRUE)
rainfall2 <- data.frame(DATE, month, year, season, RR, stringsAsFactors=FALSE)
{{1}}
答案 1 :(得分:2)
尝试data.table:
> library(data.table)
> ddt = data.table(rainfall)
> ddt[,scumsum:=cumsum(RR),by=list(season,year)]
> ddt
DATE year month season RR yearly.cumsum seasonal.cumsum scumsum
1: 19260529 1926 5 MAM 0 2347 2518 0
2: 19260530 1926 5 MAM 0 2347 2518 0
3: 19260531 1926 5 MAM 9 2356 2530 9
4: 19260601 1926 6 JJA 0 2356 2530 0
5: 19260602 1926 6 JJA 3 2359 2530 3
6: 19260603 1926 6 JJA 71 2430 2530 74
7: 19260604 1926 6 JJA 0 2430 2530 74
8: 19260605 1926 6 JJA 48 2478 2534 122
答案 2 :(得分:1)
您实际上可以使用tapply
而不创建yearly.cumsum
(尽管我同意tapply
通过撤消订单来表现有点尴尬)
transform(rainfall,
seasonal.cumsum =
unlist(rev(tapply(RR, list(season, year), FUN = cumsum))))
# DATE year month season RR yearly.cumsum seasonal.cumsum
# 1 19260529 1926 5 MAM 0 2347 0
# 2 19260530 1926 5 MAM 0 2347 0
# 3 19260531 1926 5 MAM 9 2356 9
# 4 19260601 1926 6 JJA 0 2356 0
# 5 19260602 1926 6 JJA 3 2359 3
# 6 19260603 1926 6 JJA 71 2430 74
# 7 19260604 1926 6 JJA 0 2430 74
# 8 19260605 1926 6 JJA 48 2478 122