BigQuery:如何随时间推移运行总计

时间:2017-03-22 12:50:33

标签: mysql sql google-bigquery

当从商店购买商品时,我有一个BigQuery表记录。它包含ItemID和时间戳。我对购买的每件商品的运行总数感兴趣。我有这个查询生成运行总计:

SELECT ItemID,timestamp,count(*)
OVER
  (PARTITION BY ItemID
  ORDER BY timestamp ASC, ItemID) AS runningtotal
from 
(
  SELECT * FROM [mydb.purchases] 
)
ORDER BY timestamp

此表包含数十万行。 我现在要做的是花一段时间(例如一周),并获得该周内每个ItemID的100个运行总计样本(为了绘制没有太多数据点的图表)。 我不知道该怎么做。我可以通过过滤“where(rownumber%(rowcount / 100)= 0”这样的东西来获得100个样本但是如何为表中的每个ItemID执行此操作?我是否需要为每个ItemID执行多个子查询然后创建工会?谢谢

2 个答案:

答案 0 :(得分:0)

使用标准SQL,您可以首先使用LIMIT函数内的ARRAY_AGG子句收集100个时间戳的样本:

#standardSQL
SELECT ItemID, timestamp, COUNT(*)
OVER (PARTITION BY ItemID ORDER BY timestamp ASC) AS running_total
FROM (
SELECT ItemID, ARRAY_AGG(timestamp LIMIT 100) timestamps
FROM `mydb.purchases`) t, t.timestamps timestamp
ORDER BY timestamp

如果这不进行随机抽样,您可以使用RAND()重新调整时间戳:

#standardSQL
SELECT ItemID, timestamp, COUNT(*)
OVER (PARTITION BY ItemID ORDER BY timestamp ASC) AS running_total
FROM (
SELECT ItemID, ARRAY_AGG(timestamp ORDER BY RAND() LIMIT 100) timestamps
FROM `mydb.purchases`)  t, t.timestamps timestamp
ORDER BY timestamp

答案 1 :(得分:0)

以下是您在抽样意义上所描述的内容 我将selecting week worse of data方面排除在外,因为它是微不足道的

  
#standardSQL
SELECT
  ItemID,
  timestamp,
  runningtotal  
FROM (
  SELECT 
    ItemID, 
    timestamp, 
    COUNT(1) OVER (PARTITION BY ItemID ORDER BY timestamp ASC) AS runningtotal,
    ROW_NUMBER() OVER (PARTITION BY ItemID ORDER BY timestamp ASC) AS rownumber,
    COUNT(1) OVER(PARTITION BY ItemID) AS rowcount
  FROM `mydb.purchases`
)
WHERE MOD(rownumber, CAST(rowcount/100 AS INT64)) = 0
-- ORDER BY ItemID, timestamp