BigQuery LEFT JOIN表格并根据条件过滤其数组元素

时间:2019-12-06 13:33:22

标签: arrays google-bigquery

我想将一个表连接到另一个包含数组的表,并且在连接结果中,我只希望具有通过条件的数组元素。在这种情况下,日期条件。 下面的代码段说明了我的问题。我希望输出仅包含ids小于{2019-10-15'的record_dates

WITH platform AS (
        SELECT 'u1' AS id, 'm1' AS platform_id, '2019-10-12' as record_date
        UNION ALL
        SELECT 'u2' AS id, 'm1' AS platform_id, '2019-10-13' as record_date
        UNION ALL
        SELECT 'u21' AS id, 'm1' AS platform_id, '2019-10-16' as record_date    
), 

platform_agg AS (
        SELECT platform_id
              , ARRAY_AGG(id) as ids
              , ARRAY_AGG(record_date) as record_dates
        FROM platform
        GROUP BY platform_id
),


orders AS(
        SELECT 'u2' AS id, 'c1' AS order_id, '2019-10-15' as order_date 
), 


orders_plus_platform AS ( 
SELECT order_id
      , orders.id 
      , orders.order_date
      , platform.platform_id 
      , CASE WHEN platform.platform_id IS NOT NULL THEN platform_agg.ids ELSE [orders.id] END AS ids
      , CASE WHEN platform.platform_id IS NOT NULL THEN platform_agg.record_dates ELSE NULL END AS record_dates
FROM orders
    LEFT JOIN platform
        ON orders.id = platform.id and platform.record_date <= orders.order_date
    LEFT JOIN platform_agg
        ON platform.platform_id = platform_agg.platform_id 
)

SELECT * FROM orders_plus_platform

以下是当前查询的输出,但是,在所需的输出中,{record {date}应该在'2019-10-15'之后,因此u21元素应被过滤掉。

enter image description here

谢谢

2 个答案:

答案 0 :(得分:1)

以下解决方案对我有用。基本上,您两次连接到平台表以获取与平台关联的所有ID,而不是连接到平台的预聚合版本。这样,您可以更轻松地应用过滤器。

orders_plus_platform AS ( 
SELECT order_id
      , orders.id 
      , orders.order_date
      , platform.platform_id 
      , ARRAY_AGG(CASE WHEN platform.platform_id IS NOT NULL THEN platform2.id ELSE orders.id END) AS ids
      , ARRAY_AGG(CASE WHEN platform.platform_id IS NOT NULL THEN platform2.record_date ELSE NULL END) AS record_dates
FROM orders        
    LEFT JOIN platform
        ON orders.id = platform.id and platform.record_date <= orders.order_date
    LEFT JOIN platform platform2
       ON platform.platform_id = platform2.platform_id AND platform2.record_date <= orders.order_date 
 GROUP BY     
      order_id  
      , orders.id 
      , orders.order_date
      , platform.platform_id 
)

答案 1 :(得分:0)

您可以在WHERE子句中使用子查询。子查询可以在未嵌套的数组上运行并返回布尔值-例如日期计数<某些值应大于零:

SELECT c_id
      , c.id 
      , c.c_date
      , cxd.record_id 
      , CASE WHEN cxd.record_id IS NOT NULL THEN rd_agg.ids ELSE [c.id] END AS ids
      , CASE WHEN cxd.record_id IS NOT NULL THEN rd_agg.record_dates ELSE NULL END AS record_dates
FROM c
    LEFT JOIN record_ids cxd
        ON c.id = cxd.id and cxd.record_date <= c.c_date
    LEFT JOIN record_ids_agg rd_agg
        ON cxd.record_id = rd_agg.record_id 
  WHERE (SELECT COUNT(1)>0 FROM UNNEST(record_dates) AS r WHERE r < '2019-10-15')