BigQuery - Shuffle出错

时间:2013-04-13 04:40:35

标签: google-bigquery

我有一张大约5M行的表。请注意,这只是一个poc。最终,我们需要处于TB范围内。我正在自我加入,以寻找市场购物篮分析的产品排列。

我需要找到组合在篮子中出现的次数,出现次数与总篮子数的比率,以及项目在所有篮子中出现的次数。这是非常标准的。 BigQuery不支持在另一个select的谓词中选择,所以我需要创建另一个我认为的连接。这就是我想出的 -

select twoItem.upc1,twoItem.upc2,twoItem.twoItemOccurrences, totalUpc.totalUpcCount
from
(
    select purchase1.upc as upc1,purchase2.upc as upc2,count(upc1) as twoItemOccurrences
    from
    conagra.purchase as purchase1
    join each conagra.purchase as purchase2
    on purchase1.upc = purchase2.upc
    group by upc1,upc2
) as twoItem
JOIN EACH 
(
    select purchase3.upc as upc3, count(*) as totalUpcCount
    from conagra.purchase as purchase3
    group by upc3
) as totalUpc
on totalUpc.upc3 = twoItem.upc1
LIMIT 50;

我收到以下错误:

  

SHUFFLE BY可能只适用于可并行化的查询,但查询不可并行化:(SELECT * FROM (SELECT [purchase3.upc] AS [upc3], COUNT(*) AS [totalUpcCount]...

也许未发表的限制?

任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:1)

尝试在内部查询上使用GROUP EACH BY运行这些内容。我们将改进此类查询的响应消息。