我需要使用R计算一些中间计算。 以下是有些事件及其类型的数据。
structure(list(year = c(1994, 1995, 1997, 1997, 1998, 1998, 1998,
2000, 2000, 2001, 2001, 2002), N = c(3L, 1L, 1L, 4L, 1L, 1L,
4L, 1L, 2L, 1L, 5L, 1L), type = c("OIL", "LNG", "AGS", "OIL",
"DOCK", "LNG", "OIL", "LNG", "OIL", "LNG", "OIL", "DOCK")), .Names = c("year",
"N", "type"), row.names = c(NA, 12L), class = "data.frame")
> head(mydf3)
year N type
1 1994 3 OIL
2 1995 1 LNG
3 1997 1 AGS
4 1997 4 OIL
5 1998 1 DOCK
6 1998 1 LNG
我需要获得有关按年份和类型的累积总和数据,今年的总累计金额以及所有类型的当前累积金额的数据。
所以我需要获得这样的信息
year type cntyear cnt_cumultype cnt_cumulalltypes
1994 OIL 3 3 3
1994 LNG 0 0 3
1994 AGS 0 0 3
1994 DOCK 0 0 3
1995 OIL 0 3 4
1995 LNG 1 1 4
1995 AGS 0 0 4
1995 DOCK 0 0 4
...
一些解释:
只是想做这样的事情,但它做得不对......
mydf3$cnt_cumultype<-tail(cumsum(mydf3[which(mydf3$type==mydf3$type & mydf3$year==mydf3$year),]$N), n=1)
如何按行计算这个数字?
答案 0 :(得分:0)
以下是data.table
包的解决方案。这也可以在基数R中解决,但data.table
可以缩短一步。
# load library
library(data.table)
# caste df as a data.table and change column order
setcolorder(setDT(df), c("year", "type", "N"))
# change column names
setnames(df, names(df), c("year", "type", "cntyear"))
# get all type-year combinations in data.table with `CJ` and join these to original
# then, in second [, replace all observations with missing counts to 0
df2 <- df[CJ("year"=unique(df$year), "type"=unique(df$type)), on=c("year", "type")
][is.na(cntyear), cntyear := 0]
# get cumulative counts for each type
df2[, cnt_cumultype := cumsum(cntyear), by=type]
# get total counts for each year
df2[, cnt_cumulalltypes := cumsum(cntyear)]
这导致
df2
year type cntyear cnt_cumultype cnt_cumulalltypes
1: 1994 AGS 0 0 0
2: 1994 DOCK 0 0 0
3: 1994 LNG 0 0 0
4: 1994 OIL 3 3 3
5: 1995 AGS 0 0 3
6: 1995 DOCK 0 0 3
7: 1995 LNG 1 1 4
8: 1995 OIL 0 3 4
9: 1997 AGS 1 1 5
....