我需要用过去三年的总和创建一个新变量'每个ID的金额。
如果没有三年'值得的数据,应该有一个' NA'。
举个例子:
ID YEAR AMOUNT
1 2010 5
1 2011 2
1 2012 4
1 2013 1
1 2014 3
2 2013 4
2 2014 6
2 2015 9
3 2012 4
3 2013 7
3 2014 2
3 2015 3
这里的结果应该是:
ID YEAR AMOUNT THREE_YR
1 2010 5 NA
1 2011 2 NA
1 2012 4 11
1 2013 1 7
1 2014 3 8
2 2013 4 NA
2 2014 6 NA
2 2015 9 19
3 2012 4 NA
3 2013 7 NA
3 2014 2 13
3 2015 3 12
我该怎么做?谢谢!
答案 0 :(得分:2)
我们可以使用dplyr
和zoo
中的功能。 dt2
是最终输出。
# Create example data frame
dt <- read.table(text = "ID YEAR AMOUNT
1 2010 5
1 2011 2
1 2012 4
1 2013 1
1 2014 3
2 2013 4
2 2014 6
2 2015 9
3 2012 4
3 2013 7
3 2014 2
3 2015 3",
header = TRUE, stringsAsFactors = FALSE)
# Load packages
library(dplyr)
library(zoo)
# Process the data
dt2 <- dt %>%
group_by(ID) %>%
mutate(THREE_YR = rollsum(AMOUNT, k = 3, fill = NA, align = "right"))
OP询问如果只有一行或两行的ID,该怎么办。老实说,我没有找到解决这个问题的好方法。我唯一能想到的是将原始数据框划分为两组,将rollsum
应用于所有大于或等于三的记录的组。之后,组合所有组。
# Create example data frame
dt <- read.table(text = "ID YEAR AMOUNT
1 2010 5
1 2011 2
1 2012 4
1 2013 1
1 2014 3
2 2013 4
3 2012 4
3 2013 7
3 2014 2
3 2015 3",
header = TRUE, stringsAsFactors = FALSE)
# Load packages
library(dplyr)
library(zoo)
# Process the data
dt2 <- dt %>%
group_by(ID) %>%
filter(n() >= 3) %>%
mutate(THREE_YR = rollsum(AMOUNT, k = 3, fill = NA, align = "right")) %>%
bind_rows(dt %>% group_by(ID) %>% filter(n() < 3)) %>%
arrange(ID, YEAR)
答案 1 :(得分:1)
使用data.table
:
library(data.table)
setDT(dt)
setorder(dt,YEAR)
dt[,.(YEAR,AMOUNT,THREE_YR=AMOUNT+shift(AMOUNT,1)+shift(AMOUNT,2)),by=.(ID)]
#ID YEAR AMOUNT THREE_YR
# 1: 1 2010 5 NA
# 2: 1 2011 2 NA
# 3: 1 2012 4 11
# 4: 1 2013 1 7
# 5: 1 2014 3 8
# 6: 3 2012 4 NA
# 7: 3 2013 7 NA
# 8: 3 2014 2 13
# 9: 3 2015 3 12
#10: 2 2013 4 NA
#11: 2 2014 6 NA
#12: 2 2015 9 19
答案 2 :(得分:1)
使用zoo::rollapplyr()
和aggregate()
如果组中的成员少于三个,则返回NA
。
x <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L,
3L, 3L), YEAR = c(2010L, 2011L, 2012L, 2013L, 2014L, 2013L, 2014L,
2015L, 2012L, 2013L, 2014L, 2015L), AMOUNT = c(5L, 2L, 4L, 1L,
3L, 4L, 6L, 9L, 4L, 7L, 2L, 3L)), .Names = c("ID", "YEAR", "AMOUNT"
), class = "data.frame", row.names = c(NA, -12L))
library(zoo)
rsum <- aggregate(AMOUNT ~ ID, data=x,
FUN=function(x) rollapplyr(x, 3, fill=NA, partial=TRUE,
FUN=function(y) if (length(y) >= 3) sum(y) else NA))
x$rsum <- do.call(c, rsum$AMOUNT)
x
# ID YEAR AMOUNT rsum
# 1 1 2010 5 NA
# 2 1 2011 2 NA
# 3 1 2012 4 11
# 4 1 2013 1 7
# 5 1 2014 3 8
# 6 2 2013 4 NA
# 7 2 2014 6 NA
# 8 2 2015 9 19
# 9 3 2012 4 NA
# 10 3 2013 7 NA
# 11 3 2014 2 13
# 12 3 2015 3 12
# remove one of the 2s
x <- x[-6, ]
rsum <- aggregate(AMOUNT ~ ID, data=x,
FUN=function(x) rollapplyr(x, 3, fill=NA, partial=TRUE,
FUN=function(y) if (length(y) >= 3) sum(y) else NA))
x$rsum <- do.call(c, rsum$AMOUNT)
x
# ID YEAR AMOUNT rsum
# 1 1 2010 5 NA
# 2 1 2011 2 NA
# 3 1 2012 4 11
# 4 1 2013 1 7
# 5 1 2014 3 8
# 7 2 2014 6 NA
# 8 2 2015 9 NA
# 9 3 2012 4 NA
# 10 3 2013 7 NA
# 11 3 2014 2 13
# 12 3 2015 3 12