左连接有多个标准,这些标准部分为空或为空

时间:2017-04-11 21:11:43

标签: left-join google-bigquery

我有一张包含跟踪数据的表格。除其他值外,该表还包含traffic_medium,traffic_source和traffic_campaign列。列有时包含(none)或null作为值。

我想使用左边的连接,中间,scource和广告系列作为匹配条件来匹配来自其他桌子的访问者总数。

如果所有列都包含数据,则此方法正常。如果一列有(无)或null为值,则它不起作用。

我使用BigQuery和旧版SQL。

SELECT  
A.id,
A.trafficSource_medium,
A.trafficSource_source,
A.trafficSource_campaign,
B.sum_visitor AS sum_visitor

FROM [table] AS A
left outer join (Select 
count(distinct fullvisitorID) as sum_visitor,
trafficSource_medium,
trafficSource_source,
trafficSource_campaign
FROM [table2] 
GROUP BY trafficSource_medium,
trafficSource_source,
trafficSource_campaign)
AS B
on A.trafficSource_medium=B.trafficSource_medium AND     
A.trafficSource_source=B.trafficSource_source AND 
A.trafficSource_campaign=B.trafficSource_campaign

感谢您的帮助!

1 个答案:

答案 0 :(得分:0)

尝试下面的内容 假设各个字段是STRING类型。如果它们是INT - 将'n / a'替换为让我们说-999 - 重要的是选择不用作相应字段值的常量

  
#legacySQL
SELECT  
  A.id,
  CASE WHEN A.trafficSource_medium = 'n/a' THEN NULL ELSE A.trafficSource_medium END AS trafficSource_medium,
  CASE WHEN A.trafficSource_source = 'n/a' THEN NULL ELSE A.trafficSource_source END AS trafficSource_source,
  CASE WHEN A.trafficSource_campaign = 'n/a' THEN NULL ELSE A.trafficSource_campaign END AS trafficSource_campaign,
  B.sum_visitor AS sum_visitor
FROM (
  SELECT 
    id,
    IFNULL(trafficSource_medium, 'n/a') AS trafficSource_medium,
    IFNULL(trafficSource_source, 'n/a') AS trafficSource_source,
    IFNULL(trafficSource_campaign 'n/a') AS trafficSource_campaign
  FROM [table] 
) AS A
LEFT OUTER JOIN (
  SELECT 
    COUNT(DISTINCT fullvisitorID) AS sum_visitor,
    IFNULL(trafficSource_medium, 'n/a') AS trafficSource_medium,
    IFNULL(trafficSource_source, 'n/a') AS trafficSource_source,
    IFNULL(trafficSource_campaign 'n/a') AS trafficSource_campaign
  FROM [table2] 
  GROUP BY 
    trafficSource_medium,
    trafficSource_source,
    trafficSource_campaign
) AS B
ON A.trafficSource_medium = B.trafficSource_medium 
AND A.trafficSource_source = B.trafficSource_source 
AND A.trafficSource_campaign = B.trafficSource_campaign  

这里的想法是将NULL“转换”为某个值,因此它们是JOIN'able - 然后在最终的SELECT中将其“转换”为NULL

如果你可以迁移到标准SQL - 你可以尝试下面的代码 - 它做的改动较少 - 主要是在ON子句中

#standardSQL
SELECT  
  A.id,
  A.trafficSource_medium,
  A.trafficSource_source,
  A.trafficSource_campaign,
  B.sum_visitor AS sum_visitor
FROM `table` AS A
LEFT OUTER JOIN (
  SELECT 
    COUNT(DISTINCT fullvisitorID) AS sum_visitor,
    trafficSource_medium,
    trafficSource_source,
    trafficSource_campaign
  FROM `table2`
  GROUP BY 
    trafficSource_medium,
    trafficSource_source,
    trafficSource_campaign
) AS B
ON IFNULL(A.trafficSource_medium, 'n/a') = IFNULL(B.trafficSource_medium, 'n/a') 
AND IFNULL(A.trafficSource_source, 'n/a') = IFNULL(B.trafficSource_source, 'n/a') 
AND IFNULL(A.trafficSource_campaign, 'n/a') = IFNULL(B.trafficSource_campaign, 'n/a')