我想计算数据表中变量的差异,按id分组。这是一些示例数据。以1Hz的采样率记录数据。我想估计一阶和二阶导数(速度,加速度)
df <- read.table(text='x y id
1 2 1
2 4 1
3 5 1
1 8 2
5 2 2
6 3 2',header=TRUE)
dt<-data.table(df)
预期输出
# dx dy id
# NA NA 1
# 1 2 1
# 1 1 1
# NA NA 2
# 4 -6 2
# 1 1 2
这是我尝试过的事情
dx_dt<-dt[, diff:=c(NA,diff(dt[,'x',with=FALSE])),by = id]
输出
Error in `[.data.frame`(dt, , `:=`(diff, c(NA, diff(dt[, "x", with = FALSE]))), :
unused argument (by = id)
正如Akrun所指出的,可以使用数据表或plyr获得“速度”项(dx,dy)。但是,我无法很好地理解计算,无法将其扩展到加速条件。那么,如何计算第二个滞后项?
dt[, c('dx', 'dy'):=lapply(.SD, function(x) c(NA, diff(x))),
+ by=id]
产生
x y id dx dy
1: 1 2 1 NA NA
2: 2 4 1 1 2
3: 3 5 1 1 1
4: 1 8 2 NA NA
5: 5 2 2 4 -6
6: 6 3 2 1 1
如何扩展以获得第二个差异,或dx的差异,dy?
x y id dx dy dx2 dy2
1: 1 2 1 NA NA NA NA
2: 2 4 1 1 2 NA NA
3: 3 5 1 1 1 0 -1
4: 1 8 2 NA NA NA NA
5: 5 2 2 4 -6 NA NA
6: 6 3 2 1 1 -3 7
答案 0 :(得分:1)
你可以尝试
setnames(dt[, lapply(.SD, function(x) c(NA,diff(x))), by=id],
2:3, c('dx', 'dy'))[]
# id dx dy
#1: 1 NA NA
#2: 1 1 2
#3: 1 1 1
#4: 2 NA NA
#5: 2 4 -6
#6: 2 1 1
另一种选择是使用dplyr
library(dplyr)
df %>%
group_by(id) %>%
mutate_each(funs(c(NA,diff(.))))%>%
rename(dx=x, dy=y)
您可以重复此步骤
dt[, c('dx', 'dy'):=lapply(.SD, function(x) c(NA, diff(x))), by=id]
dt[,c('dx2', 'dy2'):= lapply(.SD, function(x) c(NA, diff(x))),
by=id, .SDcols=4:5]
dt
# x y id dx dy dx2 dy2
#1: 1 2 1 NA NA NA NA
#2: 2 4 1 1 2 NA NA
#3: 3 5 1 1 1 0 -1
#4: 1 8 2 NA NA NA NA
#5: 5 2 2 4 -6 NA NA
#6: 6 3 2 1 1 -3 7
或者我们可以使用shift
data.table
函数
dt[, paste0("d", c("x", "y")) := .SD - shift(.SD), by = id
][, paste0("d", c("x2", "y2")) := .SD - shift(.SD) , by = id, .SDcols = 4:5 ]