组和联接后返回的表的数量过大

时间:2018-12-15 00:01:49

标签: sql google-bigquery left-join

我正在尝试按位置将Google中的营销数据与Facebook中的营销数据融合。第一条SELECT语句从第5行中的嵌套SELECT语句获取的表中获取列,然后我必须将其与其他表连接以获得DMA名称(第11行)。最后,我将其与facebook数据结合在一起。当我运行查询时,在汇总所有DMA时,点击,支出和展示的结果都在公告中。相反,根据度量标准,它们应该在1000万到1亿之间。

我真的是SQL的新手,所以我确信有更好的方法来考虑如何解决此问题。我确定我的语法不符合最佳做法标准。我欢迎所有反馈。

SELECT sum(sub.clicks) AS clicks, sum(sub.spend) AS spend, 
       sum(sub.impressions) AS impressions, sub.date, 
       location_with_adwordsID.DMA_NAME, sub.ad_network_type_2
FROM
       (SELECT sum(clicks) AS clicks, sum(cost) AS spend, 
        sum(impressions) AS 
        impressions, cast(date AS Date) AS date, city_criteria_id , 
        ad_network_type_2
FROM   adwords.location
GROUP BY date, city_criteria_id, ad_network_type_2) AS sub
LEFT JOIN location_conversion.location_with_adwordsID ON 
         CAST(sub.city_criteria_id AS STRING) = 
         CAST(location_with_adwordsID.criteria_id AS STRING)
GROUP BY date, DMA_NAME, ad_network_type_2
UNION ALL
(SELECT sum(clicks) AS clicks, sum(spend) AS spend, sum(impressions) AS 
        impressions, CAST(date AS Date) AS date,  lower(dma) AS fbdma, 
        'Facebook' as Source FROM 
         facebook_ad_insights_dma.ad_insights_locations
GROUP BY Date, fbdma)

这是“ location_with_adwordsID”表的结构。 https://drive.google.com/file/d/1oKd3O_fVOjwO1EnZ5LFjHIiB3EB32be5/view?usp=sharing

这是“ adwords.location”表的结构。 https://drive.google.com/file/d/1XlHC7Ug2yW9XNkNR6kolmmJPrfUa-S6n/view?usp=sharing

之所以加入LEFT,是因为:Google Ads给我提供了看似专有的“ city_id”的位置数据。要将这些数据与facebook数据结合在一起,我需要在adwords表中添加DMA列,然后将FB和google合并。那就是我的“ location_with_adwordsID”进入的位置,这是由Google制作的表格,该表格具有DMA和邮政编码的city_id。因此,在执行此联接后,我希望得到的结果是一个表,该表具有与“ adwords.location”相同的行数,但具有一个额外的“ DMA”列。

谢谢。

1 个答案:

答案 0 :(得分:0)

如果不查看表结构和样本数据,很难给出明确的答案。

但是,根据您的SQL代码,您似乎在第一个SELECT中有一个不必要的嵌套查询:您不需要sub子查询,可以直接连接表{{1} }和adwords.location,并在location_conversion.location ed字段中使用聚合函数(SUM)。这将简化查询并消除潜在的重复。

尝试:

SELECT

如果您仍然获得不切实际的数据,则必须检查SELECT sum(clicks) AS clicks, sum(spend) AS spend, sum(impressions) AS impressions, cast(date AS Date) AS date, location_with_adwordsID.dma_date, sub.ad_network_type_2 FROM adwords.location LEFT JOIN location_conversion.location_with_adwordsID ON CAST(loc.city_criteria_id AS STRING) = CAST(ad.criteria_id AS STRING) GROUP BY date, dma_name, ad_network_type_2 UNION ALL SELECT sum(clicks) AS clicks, sum(spend) AS spend, sum(impressions) AS impressions, CAST(date AS Date) AS date, lower(dma) AS fbdma, 'Facebook' as Source FROM facebook_ad_insights_dma.ad_insights_locations GROUP BY date, fbdma (我别名为adwords.location)和loc(别名为location_conversion.location_with_adwordsID)之间的关系:对于给定的adad中有多条记录,那么您的查询将对同一条criteria_id记录进行数次计数,这引起了问题。在这种情况下,您必须通过添加其他条件来优化loc