我的数据看起来像这样
order val1 val2 win
1: 1 8.5 6.0 NA
2: 2 7.0 5.0 NA
3: 3 6.0 5.0 NA
4: 4 6.0 5.0 NA
5: 5 6.0 5.0 NA
6: 6 8.0 7.0 NA
7: 7 5.0 4.0 NA
8: 8 5.0 4.0 NA
9: 9 5.0 3.0 NA
10: 10 7.0 2.0 NA
11: 11 4.0 3.0 NA
12: 12 4.0 3.0 NA
13: 13 3.0 2.5 NA
14: 14 6.0 5.0 NA
15: 15 3.0 2.5 1
16: 16 2.0 1.0 NA
17: 17 5.0 3.5 NA
18: 18 3.0 2.7 NA
19: 19 2.5 1.7 NA
........ etc ..........
我所挣扎的是基本上创建一个新列,它在win==1
之后和之后开始计数。在val1
之上添加的内容必须低于上一行的val2
。只要val1
低于val2
,就会一直计算,如果不是,则跳过该标准,直到总共七次为止。像这样:
order val1 val2 win cond_win
14: 14 6.0 5.0 NA NA
15: 15 3.0 2.5 1 NA
16: 16 2.0 1.0 NA 1
17: 17 5.0 3.5 NA NA
18: 18 3.0 2.7 NA 2
19: 19 2.5 1.7 NA 3
20: 20 1.5 1.3 NA 4
21: 21 1.2 0.5 1 5
22: 22 6.0 5.5 NA NA
23: 23 5.0 4.5 NA 6
24: 24 4.0 3.5 NA 7
25: 25 3.0 2.5 NA NA
26: 26 2.0 1.5 NA NA
这是我希望它重置并基本上再次开始寻找。 目前正在与跳绳部分斗争。
循环解决方案是我试图采取的路线,但也会太慢。
data.table
中是否有任何可能更快的优雅解决方案?
这是一些数据,以及我到目前为止所提出的数据。
DT <- data.table(order=seq(1,50,1),
val1=c(8.5,7,6,6,6,8,5,5,5,7,4,4,3,6,3,2,5,3,2.5,1.5,1.2,6,5,4,3,2),
val2=c(6,5,5,5,5,7,4,4,3,2,3,3,2.5,5,2.5,1,3.5,2.7,1.7,1.3,0.5,5.5,4.5,3.5,2.5,1.5),
win=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1))
# find the first
DT[win==1 & val1 < shift(val2,1),cond_win:=1]
# attempt at looping
for(i in 1:7){
DT[shift(cond_win,1)==i & val1 < shift(val2,1),cond_win:=i+1]
}
DT
答案 0 :(得分:1)
OP明确指出,计算行数需要满足四个条件
win
== 1。val1
低于上一行val2
的行。 一旦计数开始,win
== 1的任何中间出现都不重新开始计数。
这里的困难是win
== 1和val1
i 的不规则性&lt; val2
i-1 模式和条件的相互依赖性。所以,这个问题不能完全矢量化。我们仍然需要在win
== 1:
# find all appearances of win == 1, remember row number of next row,
# ensure start is a valid row number (no overrun in case last row has win == 1)
start <- DT[win == 1 & order < .N, order + 1]
DT[, cond_val := val1 < shift(val2, fill = FALSE) ]
DT[order >= first(start), cond_win := cumsum(cond_val)]
# implied loop over all appearances of win == 1
dummy <- lapply(start, function(start) {
if (DT[start, cond_win > 7]) {
# restart count from this row
DT[order >= start, cond_win := cumsum(cond_val)]
}
})
# rows which don't satisfy the conditions become NA
DT[!cond_win %between% c(1,7) | !cond_val, cond_win := NA]
DT
结果如下:
order val1 val2 win cond_val cond_win
1: 1 8.5 6.0 NA FALSE NA
2: 2 7.0 5.0 NA FALSE NA
3: 3 6.0 5.0 NA FALSE NA
4: 4 6.0 5.0 NA FALSE NA
5: 5 6.0 5.0 NA FALSE NA
6: 6 8.0 7.0 NA FALSE NA
7: 7 5.0 4.0 NA TRUE NA
8: 8 5.0 4.0 NA FALSE NA
9: 9 5.0 3.0 NA FALSE NA
10: 10 7.0 2.0 NA FALSE NA
11: 11 4.0 3.0 NA FALSE NA
12: 12 4.0 3.0 NA FALSE NA
13: 13 3.0 2.5 NA FALSE NA
14: 14 6.0 5.0 NA FALSE NA
15: 15 3.0 2.5 1 TRUE NA
16: 16 2.0 1.0 NA TRUE 1
17: 17 5.0 3.5 NA FALSE NA
18: 18 3.0 2.7 NA TRUE 2
19: 19 2.5 1.7 NA TRUE 3
20: 20 1.5 1.3 NA TRUE 4
21: 21 1.2 0.5 1 TRUE 5
22: 22 6.0 5.5 NA FALSE NA
23: 23 5.0 4.5 NA TRUE 6
24: 24 4.0 3.5 NA TRUE 7
25: 25 3.0 2.5 NA TRUE NA
26: 26 2.0 1.5 NA TRUE NA
27: 27 8.5 6.0 NA FALSE NA
28: 28 7.0 5.0 NA FALSE NA
29: 29 6.0 5.0 NA FALSE NA
30: 30 6.0 5.0 NA FALSE NA
31: 31 6.0 5.0 NA FALSE NA
32: 32 8.0 7.0 NA FALSE NA
33: 33 5.0 4.0 1 TRUE NA
34: 34 5.0 4.0 NA FALSE NA
35: 35 5.0 3.0 NA FALSE NA
36: 36 7.0 2.0 NA FALSE NA
37: 37 4.0 3.0 NA FALSE NA
38: 38 4.0 3.0 NA FALSE NA
39: 39 3.0 2.5 NA FALSE NA
40: 40 6.0 5.0 NA FALSE NA
41: 41 3.0 2.5 NA TRUE 1
42: 42 2.0 1.0 NA TRUE 2
43: 43 5.0 3.5 NA FALSE NA
44: 44 3.0 2.7 1 TRUE 3
45: 45 2.5 1.7 NA TRUE 4
46: 46 1.5 1.3 NA TRUE 5
47: 47 1.2 0.5 NA TRUE 6
48: 48 6.0 5.5 NA FALSE NA
49: 49 5.0 4.5 NA TRUE 7
50: 50 4.0 3.5 NA TRUE NA
order val1 val2 win cond_val cond_win
答案 1 :(得分:0)
# data generation
exempleData <- data.frame(order = 1:50 ,val1 = runif(50),val2 = runif(50))
exempleData$win <- NA; exempleData$win[sample(1:50,1)] <- 1
exempleData$cond_win <- NA
# select rows under conditions and assign value one to cond_win
exempleData$cond_win[((which(exempleData$win == 1)+1):length(exempleData$val1))][
exempleData$val2[((which(exempleData$win == 1)+1):length(exempleData$val2))-1]>
exempleData$val1[((which(exempleData$win == 1)+1):length(exempleData$val1))]
] <- 1
# transform 1 to count in cond_win
exempleData$cond_win[!is.na(exempleData$cond_win) ] <- cumsum(exempleData$cond_win[!is.na(exempleData$cond_win) ])
# Remove count greater than 7
exempleData$cond_win[exempleData$cond_win>7] <- NA
# Here we are!
exempleData