我有这个data.table
,它有一些特定于群体的数据,以及一些一般数据:
group year flow agg
1: 51557094 2010 3.46000 592649.6
2: 51557133 1999 111.60000 522706.2
3: 51557133 2000 29.36000 555279.7
4: 51557133 2003 96.38000 592649.6
5: 51557193 2004 65.22000 550622.4
flow
此处group
- year
具体,agg
具体year
。我想计算第一个差异:对于基于flow
的{{1}}和基于group
的第一个差异,以及year
没有分组,只需agg
{ {1}}。
我更喜欢不包括year
的方法。
dplyr
答案 0 :(得分:5)
你可以尝试
library(data.table)
myDataTable[, ind:= 1:.N][order(year)][seq_len(.N) %in% 1:2,
dFlow:=c(NA, diff(flow)) , by = group][,
dAgg:= c(NA, diff(agg)), cumsum(c(TRUE, diff(year)!=1))][
order(ind)][,3:5 := NULL][]
# group year dFlow dAgg
#1: 51557094 2010 NA NA
#2: 51557133 1999 NA NA
#3: 51557133 2000 -82.24 32573.5
#4: 51557133 2003 NA NA
#5: 51557193 2004 NA -42027.2
df2 <- structure(list(group = c(51557094L, 51557133L, 51557133L,
51557133L,
51557193L), year = c(2010L, 1999L, 2000L, 2003L, 2004L),
flow = c(3.46,
111.6, 29.36, 96.38, 65.22), agg = c(592649.6, 522706.2, 555279.7,
592649.6, 550622.4)), .Names = c("group", "year", "flow", "agg"
), class = "data.frame", row.names = c("1:", "2:", "3:", "4:",
"5:"))
myDataTable <- as.data.table(df2)
答案 1 :(得分:2)
这是dplyr
方法。首先,我们将diff(log(agg))
应用于所有数据,然后我们使用group_by(group)
通过diff(flow)
应用group
。
library(dplyr)
dat %>% arrange(year) %>%
mutate(diffAgg = c(NA, diff(log(agg)))) %>%
group_by(group) %>%
mutate(diffFlow = c(NA, diff(flow)))
group year flow agg diffAgg diffFlow
1 51557133 1999 111.60 522706.2 NA NA
2 51557193 2004 65.22 550622.4 0.052029728 NA
3 51557133 2005 29.36 555279.7 0.008422676 -82.24
4 51557094 2010 3.46 592649.6 0.065131380 NA
5 51557133 2010 96.38 592649.6 0.000000000 67.02