使用条件在R中查找累积和

时间:2017-07-11 13:49:53

标签: r sum

我需要用过去三年的总和创建一个新变量'每个ID的金额。

如果没有三年'值得的数据,应该有一个' NA'。

举个例子:

ID YEAR   AMOUNT
1 2010      5
1 2011      2
1 2012      4
1 2013      1
1 2014      3
2 2013      4
2 2014      6
2 2015      9
3 2012      4
3 2013      7
3 2014      2
3 2015      3

这里的结果应该是:

ID YEAR AMOUNT THREE_YR
1 2010      5       NA
1 2011      2       NA
1 2012      4       11
1 2013      1        7
1 2014      3        8
2 2013      4       NA
2 2014      6       NA
2 2015      9       19
3 2012      4       NA
3 2013      7       NA
3 2014      2       13
3 2015      3       12

我该怎么做?谢谢!

3 个答案:

答案 0 :(得分:2)

我们可以使用dplyrzoo中的功能。 dt2是最终输出。

# Create example data frame
dt <- read.table(text = "ID YEAR   AMOUNT
1 2010      5
                 1 2011      2
                 1 2012      4
                 1 2013      1
                 1 2014      3
                 2 2013      4
                 2 2014      6
                 2 2015      9
                 3 2012      4
                 3 2013      7
                 3 2014      2
                 3 2015      3",
                 header = TRUE, stringsAsFactors = FALSE)

# Load packages
library(dplyr)
library(zoo)

# Process the data
dt2 <- dt %>%
  group_by(ID) %>%
  mutate(THREE_YR = rollsum(AMOUNT, k = 3, fill = NA, align = "right"))

更新:记录少于3条的ID组。

OP询问如果只有一行或两行的ID,该怎么办。老实说,我没有找到解决这个问题的好方法。我唯一能想到的是将原始数据框划分为两组,将rollsum应用于所有大于或等于三的记录的组。之后,组合所有组。

# Create example data frame
dt <- read.table(text = "ID YEAR   AMOUNT
                 1 2010      5
                 1 2011      2
                 1 2012      4
                 1 2013      1
                 1 2014      3
                 2 2013      4
                 3 2012      4
                 3 2013      7
                 3 2014      2
                 3 2015      3",
                 header = TRUE, stringsAsFactors = FALSE)

# Load packages
library(dplyr)
library(zoo)

# Process the data
dt2 <- dt %>%
  group_by(ID) %>%
  filter(n() >= 3) %>%
  mutate(THREE_YR = rollsum(AMOUNT, k = 3, fill = NA, align = "right")) %>%
  bind_rows(dt %>% group_by(ID) %>% filter(n() < 3)) %>%
  arrange(ID, YEAR)

答案 1 :(得分:1)

使用data.table

library(data.table)
setDT(dt)
setorder(dt,YEAR)
dt[,.(YEAR,AMOUNT,THREE_YR=AMOUNT+shift(AMOUNT,1)+shift(AMOUNT,2)),by=.(ID)]
#ID YEAR AMOUNT THREE_YR
# 1:  1 2010      5       NA
# 2:  1 2011      2       NA
# 3:  1 2012      4       11
# 4:  1 2013      1        7
# 5:  1 2014      3        8
# 6:  3 2012      4       NA
# 7:  3 2013      7       NA
# 8:  3 2014      2       13
# 9:  3 2015      3       12
#10:  2 2013      4       NA
#11:  2 2014      6       NA
#12:  2 2015      9       19

答案 2 :(得分:1)

使用zoo::rollapplyr()aggregate()
如果组中的成员少于三个,则返回NA

x <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 
  3L, 3L), YEAR = c(2010L, 2011L, 2012L, 2013L, 2014L, 2013L, 2014L, 
  2015L, 2012L, 2013L, 2014L, 2015L), AMOUNT = c(5L, 2L, 4L, 1L, 
  3L, 4L, 6L, 9L, 4L, 7L, 2L, 3L)), .Names = c("ID", "YEAR", "AMOUNT"
  ), class = "data.frame", row.names = c(NA, -12L))

library(zoo)

rsum <- aggregate(AMOUNT ~ ID, data=x, 
  FUN=function(x) rollapplyr(x, 3, fill=NA, partial=TRUE,
  FUN=function(y) if (length(y) >= 3) sum(y) else NA))

x$rsum <- do.call(c, rsum$AMOUNT)
x
#    ID YEAR AMOUNT rsum
# 1   1 2010      5   NA
# 2   1 2011      2   NA
# 3   1 2012      4   11
# 4   1 2013      1    7
# 5   1 2014      3    8
# 6   2 2013      4   NA
# 7   2 2014      6   NA
# 8   2 2015      9   19
# 9   3 2012      4   NA
# 10  3 2013      7   NA
# 11  3 2014      2   13
# 12  3 2015      3   12

# remove one of the 2s
x <- x[-6, ]

rsum <- aggregate(AMOUNT ~ ID, data=x, 
  FUN=function(x) rollapplyr(x, 3, fill=NA, partial=TRUE,
  FUN=function(y) if (length(y) >= 3) sum(y) else NA))


x$rsum <- do.call(c, rsum$AMOUNT)
x
#    ID YEAR AMOUNT rsum
# 1   1 2010      5   NA
# 2   1 2011      2   NA
# 3   1 2012      4   11
# 4   1 2013      1    7
# 5   1 2014      3    8
# 7   2 2014      6   NA
# 8   2 2015      9   NA
# 9   3 2012      4   NA
# 10  3 2013      7   NA
# 11  3 2014      2   13
# 12  3 2015      3   12