我有如下所示的数据
rno id day val
0 1 1 7
1 1 2 5
2 1 3 10
3 1 4 10
4 1 5 11
5 1 6 11
6 1 7 14
7 1 8 14
20 2 1 5
21 2 2 7
22 2 3 8
23 2 4 8
24 2 5 9
25 2 6 9
26 2 7 13
27 2 8 13
28 2 9 15
29 2 10 15
我想根据以下两个规则将新列创建为fake_flag
,并将值填充为fake_val
规则1 -对于每个值(n
),请检查前两行(n-1
,n-2
)是恒定的还是递减的(例如: 7,5或5,5是有效的,而5,7是无效的,因为它在增加并且也不是常数),并获得最大值作为输出。如果是7,5,则输出为7。如果是5,5,则输出为5
规则2 -检查当前值(n
)和下一个值(n+1
)是否比规则1输出的最大值大3点或更多点(> = 3)。例如:如果规则1的输出为5,那么我们希望看到至少8(n
),8(n+1
)。可能是9,9或10,10
我希望我的输出数据如下图所示
rno id day val fake_flag
0 1 1 7
1 1 2 5
2 1 3 10 fake_val # >= 3 from max of preceding 2 rows and `n` and `n+1` is same
3 1 4 10
4 1 5 11
5 1 6 11
6 1 7 14 fake_val # >= 3 from max of preceding 2 rows and `n` and `n+1` is same
7 1 8 14
20 2 1 5
21 2 2 7
22 2 3 8
23 2 4 8
24 2 5 9
25 2 6 9
26 2 7 13 fake_val # >= 3 from max of preceding 2 rows and `n` and `n+1` is same
27 2 8 13
28 2 9 15
29 2 10 15
答案 0 :(得分:2)
这应该完成您想要的。我用虚拟数据进行了测试,但是如果我不了解某些内容,请告诉我,我可以进行修改。
Select *
, CASE WHEN
-- Rule 1
(LAG(val, 1) over w <= LAG(val, 2) over w) AND
(val = LEAD(val, 1) over w) AND -- n = n + 1, part of rule 2
-- Can assume row n-2 is the max because it will either be the same as row n-1 or greater than row n-1 for rule 1 to be satisfied
(LAG(val, 2) over w <= val + 3) -- Only have to check current row val because for first part of rule 2 to be satisfied val for row n must equal val for row n + 1
THEN 'fake_val' -- I would just have a 1 representing it is true and then 0 if not, but up to you
ELSE null
END as fake_flag
from Dataset.Table_name
WINDOW w as (ORDER BY rno ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)
答案 1 :(得分:1)
以下是用于BigQuery标准SQL
#standardSQL
SELECT rno, id, day, val,
IF(IFNULL(val_prev2 > val_prev1, FALSE) -- rule 1
OR (
(val - GREATEST(val_prev2, val_prev1) >= 3) -- rule 2 for val(n)
AND (val_next - GREATEST(val_prev2, val_prev1) >= 3) -- rule 2 for val(n+1)
),
'fake_val', ''
) AS fake_flag
FROM (
SELECT *,
LAG(val) OVER(PARTITION BY id ORDER BY day) val_prev1,
LAG(val, 2) OVER(PARTITION BY id ORDER BY day) val_prev2,
LEAD(val) OVER(PARTITION BY id ORDER BY day) val_next
FROM `project.dataset.table`
)
如果要应用于您的问题的样本数据-结果为
Row rno id day val fake_flag
1 0 1 1 7
2 1 1 2 5
3 2 1 3 10 fake_val
4 3 1 4 10
5 4 1 5 11
6 5 1 6 11
7 6 1 7 14 fake_val
8 7 1 8 14
9 20 2 1 5
10 21 2 2 7
11 22 2 3 8
12 23 2 4 8
13 24 2 5 9
14 25 2 6 9
15 26 2 7 13 fake_val
16 27 2 8 13
17 28 2 9 15
18 29 2 10 15