在两个标准上加入两个表“无法在重复字段上进行分区”

时间:2015-12-11 00:02:50

标签: sql google-bigquery flatten

我正在使用BigQuery。

我有一个子查询,它从具有account_id,product,date和product_spend字段的表中提取数据。此子查询通过将每个订单项相加来计算每个“account_id”的每个产品的总生命周期支出。

SELECT  account_id,
        product,
        SUM(product_spend)/1000000 lifetime_product_spend

FROM    usage
GROUP BY 1, 2

结果如下:

table: lifetime                        
account_id         product          lifetime_product_spend                
===========================================================         
    A              product1              50
    A              product2              20   
    B              product2              100
    B              product3              150
    C              product3              500

我正在尝试保留值并使用更大的查询加入它们:

SELECT  account_id,
        product,
        month,
        SUM(spend)

FROM data_source
WHERE month >= DATE_ADD(today ,-5,"MONTH")
GROUP BY 1, 2, 3

此查询的表格如下所示:

table: monthly                        
account_id         product              month            spend             
=================================================================
    A              product1              1                10
    A              product1              2                20
    A              product1              3                30
    A              product2              1                5
    A              product2              2                15
    B              product2              2                100
    B              product3              2                100
    B              product3              3                50
    C              product3              1                100
    C              product3              2                400

我没有使用聚合来计算第二个表上的lifetime_product_spend。由于数据量庞大,我只能包含过去6个月的数据。这就是为什么我在不同的表中计算终生花费并加入它们的原因。

我当前的查询失败了:

SELECT  d.account_id,
        d.product,
        d.month,
        sum(d.spend),
        u.lifetime_product_spend
FROM data_source d
LEFT JOIN (SELECT  account_id,
           product,
           SUM(product_spend)/1000000 lifetime_product_spend
           FROM usage
           GROUP BY account_id, product) u
ON d.account_id = u.account_id
WHERE d.month >= DATE_ADD(today ,-5,"MONTH")
GROUP BY d.account_id, d.product, d.month, u.lifetime_product_spend

因为它似乎没有像Lifetime表中那样为每个产品分配生命周期数字。那是因为我只加入了account_id。请参阅下面的输出错误。我已经截断了这个表,因为它基本上添加了我对lifetime_product_spend(5)的输出数量,并为每个月,产品和公司添加一个......因为它忽略了这些值的'product'赋值:

table: monthly                        
account_id         product           month         spend      lifetime_product_spend       
=====================================================================================
    A              product1           1             10                   50
    A              product1           1             10                   20
    A              product1           1             10                   100
    A              product1           1             10                   150
    A              product1           1             10                   500
    A              product1           2             20                   50
    A              product1           2             20                   20
    A              product1           2             20                   100
    A              product1           2             20                   150
    A              product1           2             20                   500

我有没有办法加入他们两个?我试过在x = x和y = y:

上做一个JOIN
SELECT  d.account_id,
        d.product,
        d.month,
        sum(d.spend),
        u.lifetime_product_spend
FROM data_source d
LEFT JOIN (SELECT  account_id,
           product,
           SUM(product_spend)/1000000 lifetime_product_spend
           FROM usage
           GROUP BY account_id, product) u
ON (d.account_id = u.account_id AND d.product = u.product)
WHERE d.month >= DATE_ADD(today ,-5,"MONTH")
GROUP BY d.account_id, d.product, d.month, u.lifetime_product_spend

但它给了我这个错误:“执行失败 错误:无法在重复的字段d.product上进行分区“。 我希望我的决赛桌看起来像这样:

table: monthly                        
account_id         product           month         spend      lifetime_product_spend       
=====================================================================================
    A              product1           1             10                   50
    A              product1           2             20                   50
    A              product1           3             30                   50
    A              product2           1             5                    20
    A              product2           2             15                   20
    B              product2           2             100                  100
    B              product3           2             100                  150
    B              product3           3             50                   150
    C              product3           1             100                  500
    C              product3           2             400                  500

我想我需要“FLATTEN”,但我似乎无法在正确的地方找到它。谢谢你的阅读。

2 个答案:

答案 0 :(得分:1)

SELECT  d.account_id,
        d.product,
        d.month,
        sum(d.spend),
        u.lifetime_product_spend
FROM FLATTEN(data_source, product) d
LEFT JOIN (SELECT  account_id,
           product,
           SUM(product_spend)/1000000 lifetime_product_spend
           FROM usage
           GROUP BY account_id, product) u
ON (d.account_id = u.account_id AND d.product = u.product)
WHERE d.month >= DATE_ADD(today ,-5,"MONTH")
GROUP BY d.account_id, d.product, d.month, u.lifetime_product_spend

以上工作原始数据源在重复的字段d.product周围展平。感谢您的评论和帮助。

答案 1 :(得分:0)

写"从使用中选择...."作为子查询并在data_source表上应用INNER JOIN或LEFT JOIN。

SELECT  d.account_id,
        d.product,
        d.month,
    sum(d.spend),
    u.lifetime_product_spend
from data_source d
 left join (SELECT  account_id,
        product,
        SUM(product_spend)/1000000 lifetime_product_spend
         FROM usage
          GROUP BY account_id, product) u
  on(d.account_id=u.account_id and d.product=u.product)
WHERE d.month >= DATE_ADD(today ,-5,"MONTH")
GROUP BY d.account_id, d.product, d.month, u.lifetime_product_spend