我尝试为遇到的以下问题找到解决方案,但是我使用的数据集比较庞大,因此我尝试避免出现很多循环等。我有两个标识符var1和var2,它们与日期结合在一起是独特的。此外,我有var3,它是介于0.5(0.5是阈值)和无穷大之间的值。我尝试为var1和var2的每种组合计算var3中从一个日期到另一个日期的变化,这是我使用下面的代码行完成的,其工作原理就像一个魅力:
test = test[, test_change := var3 - shift(var3, type = "lag", n = 1), by = c("var1", "var2")]
但是,对于var3在“ 2016-01-01”日期已经高于阈值0.5的情况,结果是不正确的,在这种情况下,我想使用“ 2016-01-以“ 01”作为阈值,直到降至或低于0.5阈值。仅当开始日期为“ 2016-01-01”时才需要这样做。此外,该变化不能大于该值与阈值之间的距离,因此省略了它下降到阈值以下的部分,如在第5行中,对于(a,X),var3从1.5下降到0.6,但临时阈值是1,因此更改应等于-0.5。
数据
test = data.table(Date = as.Date(c("2016-01-01", "2016-01-01", "2016-01-01","2016-01-3", "2016-01-05", "2016-01-05", "2016-01-06", "2016-01-06", "2016-01-07")), var1 = c("a", "a", "b","a", "a", "a", "b", "a", "a"), var2 = c("X", "Y","X", "X", "X", "Y", "X", "X", "X"), var3 = c(1,0.75,0.5,1.5, 0.6,1.2, 0.55, 0.50, 0.75))
> test
Date var1 var2 var3
1: 2016-01-01 a X 1.00
2: 2016-01-01 a Y 0.75
3: 2016-01-01 b X 0.50
4: 2016-01-03 a X 1.50
5: 2016-01-05 a X 0.60
6: 2016-01-05 a Y 1.20
7: 2016-01-06 b X 0.55
8: 2016-01-06 a X 0.50
9: 2016-01-07 a X 0.75
预期结果
test = data.table(Date = as.Date(c("2016-01-01", "2016-01-01", "2016-01-01","2016-01-3", "2016-01-05", "2016-01-05", "2016-01-06", "2016-01-06", "2016-01-07")), var1 = c("a", "a", "b","a", "a", "a", "b", "a", "a"), var2 = c("X", "Y","X", "X", "X", "Y", "X", "X", "X"), var3 = c(1,0.75,0.5,1.5, 0.6,1.2, 0.55, 0.50, 0.75), change_var3 = c(0,0,0,0.5,-0.5,0.45,0.05,0,0.25))
> test
Date var1 var2 var3 change_var3
1: 2016-01-01 a X 1.00 0.00
2: 2016-01-01 a Y 0.75 0.00
3: 2016-01-01 b X 0.50 0.00
4: 2016-01-03 a X 1.50 0.50
5: 2016-01-05 a X 0.60 -0.50
6: 2016-01-05 a Y 1.20 0.45
7: 2016-01-06 b X 0.55 0.05
8: 2016-01-06 a X 0.50 0.00
9: 2016-01-07 a X 0.75 0.25
非常感谢您的帮助
答案 0 :(得分:0)
我希望以正确的方式了解您的情况。
我所做的主要更改是创建移位变量作为要使用的其他列,然后在给定条件下计算延迟。
我假设var3的第一个给定值是用于按组比较数据的临时阈值,因此它是滞后变量的NA
值。
然后,我使用您的其他条件更新了change列:如果var3
小于某个阈值或它是第一个值,请将其设置为0。
test = data.table(
Date = as.Date(c("2016-01-01", "2016-01-01", "2016-01-01","2016-01-3", "2016-01-05", "2016-01-05", "2016-01-06", "2016-01-06", "2016-01-07")),
var1 = c("a", "a", "b","a", "a", "a", "b", "a", "a"),
var2 = c("X", "Y","X", "X", "X", "Y", "X", "X", "X"),
var3 = c(1,0.75,0.5,1.5, 0.6,1.2, 0.55, 0.50, 0.75),
change_var3 = c(0,0,0,0.5,-0.5,0.45,0.05,0,0.25))
test[, var3_lag := c(NA, var3[-.N]), by = c("var1", "var2")]
test[, test_change := ifelse(var3_lag > var3[is.na(var3_lag)],
var3[is.na(var3_lag)] - var3_lag,
var3 - var3_lag),
by = c("var1", "var2")]
test[is.na(var3_lag) | var3 <= 0.5, test_change := 0]
结果为:
> test
Date var1 var2 var3 change_var3 var3_lag test_change
1: 2016-01-01 a X 1.00 0.00 NA 0.00
2: 2016-01-01 a Y 0.75 0.00 NA 0.00
3: 2016-01-01 b X 0.50 0.00 NA 0.00
4: 2016-01-03 a X 1.50 0.50 1.00 0.50
5: 2016-01-05 a X 0.60 -0.50 1.50 -0.50
6: 2016-01-05 a Y 1.20 0.45 0.75 0.45
7: 2016-01-06 b X 0.55 0.05 0.50 0.05
8: 2016-01-06 a X 0.50 0.00 0.60 0.00
9: 2016-01-07 a X 0.75 0.25 0.50 0.25
这是您需要的吗?
答案 1 :(得分:0)
我能够解决自己的问题,希望我可以帮助其他人解决我的问题。
library(data.table)
test = data.table(Date = as.Date(c("2016-01-01", "2016-01-01", "2016-01-01","2016-01-3", "2016-01-05", "2016-01-05", "2016-01-06", "2016-01-06", "2016-01-07","2016-01-08")), var1 = c("a", "a", "b","a", "a", "a", "b", "a", "a", "a"), var2 = c("X", "Y","X", "X", "X", "Y", "X", "X", "X", "X"), var3 = c(1,0.75,0.5,1.5, 0.6,1.2, 0.55, 0.50, 0.75, 0.4))
test[var3 <= 0.5, var3 := 0.5]
test[, test_threshold := ifelse(Date == "2016-01-01", var3, NA)]
test[, test := ifelse(var3 > 0.5 & (shift(var3, n = 1, type = "lag")> 0.5 |is.na(shift(var3, n = 1, type = "lag")) == TRUE) , test_threshold[1], 0.5), by = c("var1", "var2")]
test[, var5 := var3 - test]
test[var5 < 0, var5 := 0]
test[, var5_change := var5 - shift(var5, n = 1, type = "lag"),
by = c("var1", "var2")]
> test
Date var1 var2 var3 test_threshold test var5 var5_change
1: 2016-01-01 a X 1.00 1.00 1.00 0.00 NA
2: 2016-01-01 a Y 0.75 0.75 0.75 0.00 NA
3: 2016-01-01 b X 0.50 0.50 0.50 0.00 NA
4: 2016-01-03 a X 1.50 NA 1.00 0.50 0.50
5: 2016-01-05 a X 0.60 NA 1.00 0.00 -0.50
6: 2016-01-05 a Y 1.20 NA 0.75 0.45 0.45
7: 2016-01-06 b X 0.55 NA 0.50 0.05 0.05
8: 2016-01-06 a X 0.50 NA 0.50 0.00 0.00
9: 2016-01-07 a X 0.75 NA 0.50 0.25 0.25
10: 2016-01-08 a X 0.50 NA 0.50 0.00 -0.25