我要计算的是在一行之前存在的次数。
表格:
|day |time|type|1 |2 |3 |4 |
|Mon |9 |A |a1 |a2 |a3 |a4 |
|Mon |10 |B |b1 |b2 |b3 |b4 |
|Mon |12 |A |c1 |c2 |c3 |c4 |
|Mon |20 |C |d1 |d2 |d3 |d4 |
|Tue |9 |A |e1 |e2 |e3 |e4 |
|Tue |10 |B |f1 |f2 |f3 |f4 |
|Tue |11 |B |g1 |g2 |g3 |g4 |
|Tue |12 |C |h1 |h2 |h3 |h4 |
|Wed |9 |A |i1 |i2 |i3 |i4 |
输出:
|day |time|type|1 |2 |3 |4 |#A |
|Mon |9 |A |a1 |a2 |a3 |a4 |0 |
|Mon |10 |B |b1 |b2 |b3 |b4 |1 |
|Mon |12 |A |c1 |c2 |c3 |c4 |1 |
|Mon |20 |C |d1 |d2 |d3 |d4 |2 |
|Tue |9 |A |e1 |e2 |e3 |e4 |0 |
|Tue |10 |B |f1 |f2 |f3 |f4 |1 |
|Tue |11 |B |g1 |g2 |g3 |g4 |1 |
|Tue |12 |C |h1 |h2 |h3 |h4 |1 |
|Wed |9 |A |i1 |i2 |i3 |i4 |0 |
我尝试执行
select
x.day,x.time,x.type,x.1,x.2,x.3,x.4,count(*) as #A
from
Table as x
left outer join
Table(where type=A) as y
on
y.day = x.day
and
y.time < x.time
group by
x.day,x.time,x.type,x.1,x.2,x.3,x.4
但这不能给我正确答案,因为#A太低。
该表非常大,有超过3000万行和超过50k的A型,其顺序导致我的内存限制反弹,这无助于调试错误...
我还需要将其作为计划作业运行,我考虑过使用数据流或dataprep来提供帮助,但我宁愿仅在BigQuery中解决它。
任何帮助将不胜感激。
谢谢!
编辑:我正在尝试在较小的数据集上调试查询。
答案 0 :(得分:1)
以下是用于BigQuery标准SQL
#standardSQL
SELECT *,
COUNTIF(type = 'A') OVER(
PARTITION BY day ORDER BY time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) Count_of_A
FROM `project.dataset.table`
结果为
Row day time type col1 col2 col3 col4 Count_of_A
1 Mon 9 A a1 a2 a3 a4 0
2 Mon 10 B b1 b2 b3 b4 1
3 Mon 12 A c1 c2 c3 c4 1
4 Mon 20 C d1 d2 d3 d4 2
5 Tue 9 A e1 e2 e3 e4 0
6 Tue 10 B f1 f2 f3 f4 1
7 Tue 11 B g1 g2 g3 g4 1
8 Tue 12 C h1 h2 h3 h4 1
9 Wed 9 A i1 i2 i3 i4 0