连接没有相等字段的表

时间:2019-03-13 20:02:46

标签: sql google-bigquery

我正在使用BigQuery生成Google Analytics(分析)报告。我想为我拥有的每个“页面”创建一个包含总页面浏览量和最近30天总页面浏览量的视图。我可以为单个页面成功完成此操作,但是同时尝试所有页面时却出现错误。我知道为什么,但是我不确定是否可以,以及如何在单个查询中对所有页面进行处理。

这是我用来获取单个页面数据的查询:

SELECT
   (SELECT (SELECT sum(pageviews)
    FROM `centiva_ga_stats_page_views.report`
    WHERE pagepath LIKE '%28848290%')) as pages_views_all_time,

   (SELECT (SELECT sum(pageviews)
    FROM `centiva_ga_stats_page_views.report`
    WHERE pagepath LIKE '%28848290%' 
    AND date > TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -30 DAY))) as page_views_30_days

当我继续对我的page id进行硬编码时,查询运行,但显然使用此查询为所有不同页面生成了相同的结果:

SELECT ins.NO_INSCRIPTION,
    (SELECT (SELECT sum(pageviews)
FROM `centiva_ga_stats_page_views.report`
WHERE pagepath LIKE '%28848290%')) as pages_views_all_time,
(SELECT (SELECT sum(pageviews)
FROM `centiva_ga_stats_page_views.report`
WHERE pagepath LIKE '%28848290%' AND date > TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -30 DAY))) as page_views_30_days
FROM staging_remaxlist.Inscriptions ins

我尝试使用CONCAT('%',ins.NO_INSCRIPTION,'%')替换我的硬编码通配符page id,但出现此错误:

  

如果没有连接两侧的字段相等的条件,则不能使用LEFT OUTER JOIN。**

我有点理解为什么,但是我没有解决方案来使查询正常工作。

pagepath字段包含我的page id,但可以是任何内容(无标准格式)

感谢您的帮助!

这是我的数据库架构简化:

表staging_remaxlist。题字

Field             Type
id                INTEGER 
no_inscription    STRING   

表centiva_ga_stats_page_views.report

Field         Type
date          TIMESTAMP
start_date    TIMESTAMP
end_date      TIMESTAMP
pagepath      STRING
pageviews     INTEGER

centiva_ga_stats_page_views.report.pagepath的示例:

/en/house-for-sale-laurentides/350-rue-de-lucerne-ste-adele-17269832.rmx
/en/propertyview/12616898
/en/our-properties/gatineau-gatineau/181-rue-duquette-o/12078284
/fr/showproperty/18726771
/wp-content/plugins/hydrogene-wp/public/cache/16543327fr.html
/Properties/enhanceddetails/4e04ec20-M11699403?language=FR

任何可以包含staging_remaxlist.Inscriptions.NO_INSCRIPTION字段的内容,该字段是7到8(我们应该期望它达到9)个字符的整数

1 个答案:

答案 0 :(得分:1)

感谢更新的架构。听起来您真正想要的是页面的正则表达式匹配,然后使用该匹配值与您的题词表连接。下面的查询应该可以工作:

WITH total_pageviews AS (
    SELECT 
      CAST(REGEXP_EXTRACT(pagepath, r"[0-9]+") AS INT64) AS pp
      , SUM(pageviews) AS total_pageviews
    FROM `centiva_ga_stats_page_views.report`
    WHERE REGEXP_CONTAINS(pagepath, r"[0-9]+")
    GROUP BY CAST(REGEXP_EXTRACT(pagepath, r"[0-9]+") AS INT64)
)
SELECT pp, total_pageviews, SUM(pageviews) AS page_views_30_days
FROM total_pageviews JOIN `staging_remaxlist.Inscriptions` ins 
  ON ga.pp = ins.NO_INSCRIPTION
WHERE date > TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -30 DAY)
GROUP BY pp, total_pageviews

上一个查询:

WITH page_views_all AS (
    SELECT pagepath AS pp, sum(pageviews) AS page_views_all_time
    FROM `centiva_ga_stats_page_views.report`
    WHERE pagepath LIKE '%28848290%'
    GROUP BY pagepath
)

SELECT pp, page_views_all_time, SUM(pageviews) AS page_views_30_days
FROM page_views_all pva
  INNER JOIN `centiva_ga_stats_page_views.report` pv
  ON pva.pp = pv.pagepath
WHERE pagepath LIKE '%28848290%' 
  AND date > TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -30 DAY)
GROUP BY pp, page_views_all_time