这里有类似的问题,但要么我无法弄清楚如何转换到我的情况(可能),或者他们不是那么相似,但阅读接近我想做的事情(BigQuery: How to calculate the running count of distinct visitors for each day and category?)
总之...
我在bigquery中有一个订单表,它有许多列标题,我需要使用它们,但我会在这里列出其中一些
orderID,customerID,transactionDate,Revenue
(我需要获得所有字段)
我想在表中计算出客户ID的实例作为新列,所以如果我有3个订单,而我的客户ID是1234,那么数据表中的第一个实例将是1列,第二个是2,第三个是3
例如说我的数据看起来像这样
> OrderID || CustomerID || TransactionDate || Revenue
> 1 || 1 || 01/01/15 || £20
> 2 || 2 || 01/01/15 || £20
> 3 || 3 || 01/01/15 || £20
> 4 || 1 || 01/01/15 || £20
> 5 || 1 || 01/01/15 || £20
> 6 || 2 || 01/01/15 || £20
> 7 || 4 || 01/01/15 || £20
我想对它运行一个查询,在新列中添加一个新的列,说明如果CustomerID记录的实例,那么它就像
> OrderID || CustomerID || TransactionDate || Revenue ||Instance
> 1 || 1 || 01/01/15 || £20 ||1
> 2 || 2 || 01/01/15 || £20 ||1
> 3 || 3 || 01/01/15 || £20 ||1
> 4 || 1 || 01/01/15 || £20 ||2
> 5 || 1 || 01/01/15 || £20 ||3
> 6 || 2 || 01/01/15 || £20 ||2
> 7 || 4 || 01/01/15 || £20 ||1
每次发生已经看到的customerID时,Instance会增加1
另外,我需要针对一个目前在160万行的不断增长的表运行它。
我希望有人可以帮助我。
干杯
约翰
答案 0 :(得分:2)
Window Functions正在帮助您:
窗口函数可以对结果集的特定分区或“窗口”进行计算。每个窗口函数都需要一个OVER子句来指定分区,使用以下语法:
OVER (
[PARTITION BY <expr>]
[ORDER BY <expr>]
[ROWS <expr> | RANGE <expr>]
)
PARTITION BY
始终是可选的。 ORDER BY
在某些情况下是可选的,但某些窗口函数(例如rank()
或dense_rank()
)需要该子句。
JOIN EACH
和GROUP EACH BY
子句不能用于窗口函数的输出。要在使用窗口函数时生成大型查询结果,必须使用PARTITION BY
。
select *,
row_number() over (partition by CustomerID order by TransactionDate) as Instance
from (select 1 as OrderID, 1 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue),
(select 2 as OrderID, 2 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue),
(select 3 as OrderID, 3 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue),
(select 4 as OrderID, 1 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue),
(select 5 as OrderID, 1 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue),
(select 6 as OrderID, 2 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue),
(select 7 as OrderID, 4 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue)
order by OrderID
返回:
+-----+---------+------------+-----------------+---------+----------+---+
| Row | OrderID | CustomerID | TransactionDate | Revenue | Instance | |
+-----+---------+------------+-----------------+---------+----------+---+
| 1 | 1 | 1 | 01/01/15 | £20 | 1 | |
| 2 | 2 | 2 | 01/01/15 | £20 | 1 | |
| 3 | 3 | 3 | 01/01/15 | £20 | 1 | |
| 4 | 4 | 1 | 01/01/15 | £20 | 2 | |
| 5 | 5 | 1 | 01/01/15 | £20 | 3 | |
| 6 | 6 | 2 | 01/01/15 | £20 | 2 | |
| 7 | 7 | 4 | 01/01/15 | £20 | 1 | |
+-----+---------+------------+-----------------+---------+----------+---+
答案 1 :(得分:1)
你应该使用像row_number OVER这样的窗口函数(bygroupbyfields order by transaction_date分区)