BigQuery左连接似乎是在进行交叉连接

时间:2018-04-06 12:43:05

标签: sql join google-bigquery left-join cross-join

我在BigQuery中有两个表 - 一个包含ppc广告数据,另一个包含查询。我想加入这两个,所以我可以报告ppc收入与每天的支出。

这最初感觉非常简单,但我尝试了一个简单的左连接和子查询,并且两个都遇到了一些障碍,我专注于左连接。

我有:

#standardSQL
SELECT 
  CAST(ppc.Date AS DATE) AS Date,
  COUNT(1) AS `Rows`,
  COUNT(DISTINCT(ppc.ID)) AS `PPCRows`,
  COUNT(DISTINCT(EnquiryId)) AS `EnquiryRows`
FROM
  `db.ppc_data.adgroup_performance_summary_report` ppc
LEFT JOIN
  `db.enquiries.output_final_scheduled` led
ON CAST(ppc.Date AS DATE) = CAST(led.EnquiryDateTime AS DATE)
WHERE
  SUBSTR(CAST(led.EnquiryDateTime AS STRING), 1, 7) = "2018-01"
GROUP BY 1

尽管被定义为左连接,但返回的数据表明(我认为)这是在进行交叉连接 - Rows列的值是PPC Rows和{{的乘积1}}:

enter image description here

我真的不想将Enquiry Rows考虑到我需要添加的所有聚合列中!

此外,它正在运行一个年龄 - 是否有更有效的方式来编写此查询?

2 个答案:

答案 0 :(得分:1)

这绝对不是CROSS JOIN 如果COUNT(1)COUNT(ppc.ID)COUNT(EnquiryId)的产品,那就是 - 。

同时,如果您没有得到预期的结果 - 请发布描述您的用例的具体问题

答案 1 :(得分:0)

您可能希望在加入前进行汇总:

SELECT ppd.dte AS Date, ppc.rows as PPCRows,
       led.cnt as `EnquiryRows`
FROM (SELECT CAST(ppc.Date AS DATE) as dte, COUNT(*) as rows
      FROM `db.ppc_data.adgroup_performance_summary_report`
      GROUP BY CAST(ppc.Date AS DATE)
     ) ppc LEFT JOIN
     (SELECT CAST(led.EnquiryDateTime AS DATE) as dte, COUNT(*) as rows
      FROM `db.enquiries.output_final_scheduled` led
      GROUP BY CAST(led.EnquiryDateTime AS DATE)
     ) led
     ON ppc.dte = led.dte
WHERE led.EnquiryDateTime >= '2018-01-01' AND
      led.EnquiryDateTime < '2018-02-01'