使用R

时间:2017-02-15 14:15:24

标签: r dataframe calculation

我需要使用R计算一些中间计算。 以下是有些事件及其类型的数据。

structure(list(year = c(1994, 1995, 1997, 1997, 1998, 1998, 1998, 
2000, 2000, 2001, 2001, 2002), N = c(3L, 1L, 1L, 4L, 1L, 1L, 
4L, 1L, 2L, 1L, 5L, 1L), type = c("OIL", "LNG", "AGS", "OIL", 
"DOCK", "LNG", "OIL", "LNG", "OIL", "LNG", "OIL", "DOCK")), .Names =     c("year", 
"N", "type"), row.names = c(NA, 12L), class = "data.frame")


> head(mydf3)
  year N type
1 1994 3  OIL
2 1995 1  LNG
3 1997 1  AGS
4 1997 4  OIL
5 1998 1 DOCK
6 1998 1  LNG

我需要获得有关按年份和类型的累积总和数据,今年的总累计金额以及所有类型的当前累积金额的数据。

所以我需要获得这样的信息

year type cntyear cnt_cumultype cnt_cumulalltypes
1994 OIL 3 3 3
1994 LNG 0 0 3
1994 AGS 0 0 3
1994 DOCK 0 0 3
1995 OIL 0 3 4
1995 LNG 1 1 4
1995 AGS 0 0 4
1995 DOCK 0 0 4
...

一些解释:

  1. cntyear - 这是当前年份和类型的N计数。
  2. cnt_cumultype - 这是此类型的累积总和,直到当前年份。
  3. cnt_cumulalltypes - 这是所有类型的累积总和 年,包括当前< =当前年。
  4. 只是想做这样的事情,但它做得不对......

    mydf3$cnt_cumultype<-tail(cumsum(mydf3[which(mydf3$type==mydf3$type & mydf3$year==mydf3$year),]$N), n=1)
    

    如何按行计算这个数字?

1 个答案:

答案 0 :(得分:0)

以下是data.table包的解决方案。这也可以在基数R中解决,但data.table可以缩短一步。

# load library
library(data.table)
# caste df as a data.table and change column order
setcolorder(setDT(df), c("year", "type", "N"))
# change column names
setnames(df, names(df), c("year", "type", "cntyear"))

# get all type-year combinations in data.table with `CJ` and join these to original
# then, in second [, replace all observations with missing counts to 0
df2 <- df[CJ("year"=unique(df$year), "type"=unique(df$type)), on=c("year", "type")
          ][is.na(cntyear),  cntyear := 0]
# get cumulative counts for each type
df2[, cnt_cumultype := cumsum(cntyear), by=type]
# get total counts for each year
df2[, cnt_cumulalltypes := cumsum(cntyear)]

这导致

df2
    year type cntyear cnt_cumultype cnt_cumulalltypes
 1: 1994  AGS       0             0                 0
 2: 1994 DOCK       0             0                 0
 3: 1994  LNG       0             0                 0
 4: 1994  OIL       3             3                 3
 5: 1995  AGS       0             0                 3
 6: 1995 DOCK       0             0                 3
 7: 1995  LNG       1             1                 4
 8: 1995  OIL       0             3                 4
 9: 1997  AGS       1             1                 5
    ....