R:为组和非组的组合创建第一个差异

时间:2015-04-14 18:07:08

标签: r data.table

我有这个data.table,它有一些特定于群体的数据,以及一些一般数据:

         group year      flow      agg
   1: 51557094 2010   3.46000 592649.6
   2: 51557133 1999 111.60000 522706.2
   3: 51557133 2000  29.36000 555279.7
   4: 51557133 2003  96.38000 592649.6
   5: 51557193 2004  65.22000 550622.4

flow此处group - year具体,agg具体year。我想计算第一个差异:对于基于flow的{​​{1}}和基于group的第一个差异,以及year没有分组,只需agg { {1}}。

我更喜欢不包括year的方法。

预期产出

dplyr

2 个答案:

答案 0 :(得分:5)

你可以尝试

 library(data.table)
 myDataTable[, ind:= 1:.N][order(year)][seq_len(.N) %in% 1:2, 
            dFlow:=c(NA, diff(flow)) , by = group][,
            dAgg:= c(NA, diff(agg)), cumsum(c(TRUE, diff(year)!=1))][
               order(ind)][,3:5 := NULL][]
  #      group year  dFlow     dAgg
  #1: 51557094 2010     NA       NA
  #2: 51557133 1999     NA       NA
  #3: 51557133 2000 -82.24  32573.5
  #4: 51557133 2003     NA       NA
  #5: 51557193 2004     NA -42027.2

数据

df2 <- structure(list(group = c(51557094L, 51557133L, 51557133L, 
51557133L, 
51557193L), year = c(2010L, 1999L, 2000L, 2003L, 2004L),
flow = c(3.46, 
111.6, 29.36, 96.38, 65.22), agg = c(592649.6, 522706.2, 555279.7, 
592649.6, 550622.4)), .Names = c("group", "year", "flow", "agg"
), class = "data.frame", row.names = c("1:", "2:", "3:", "4:", 
"5:"))

myDataTable <- as.data.table(df2)

答案 1 :(得分:2)

这是dplyr方法。首先,我们将diff(log(agg))应用于所有数据,然后我们使用group_by(group)通过diff(flow)应用group

library(dplyr) 

dat %>% arrange(year) %>%
  mutate(diffAgg = c(NA, diff(log(agg)))) %>%
  group_by(group) %>%
  mutate(diffFlow = c(NA, diff(flow)))

     group year   flow      agg     diffAgg diffFlow
1 51557133 1999 111.60 522706.2          NA       NA
2 51557193 2004  65.22 550622.4 0.052029728       NA
3 51557133 2005  29.36 555279.7 0.008422676   -82.24
4 51557094 2010   3.46 592649.6 0.065131380       NA
5 51557133 2010  96.38 592649.6 0.000000000    67.02