我正在使用BigQuery生成Google Analytics(分析)报告。我想为我拥有的每个“页面”创建一个包含总页面浏览量和最近30天总页面浏览量的视图。我可以为单个页面成功完成此操作,但是同时尝试所有页面时却出现错误。我知道为什么,但是我不确定是否可以,以及如何在单个查询中对所有页面进行处理。
这是我用来获取单个页面数据的查询:
SELECT
(SELECT (SELECT sum(pageviews)
FROM `centiva_ga_stats_page_views.report`
WHERE pagepath LIKE '%28848290%')) as pages_views_all_time,
(SELECT (SELECT sum(pageviews)
FROM `centiva_ga_stats_page_views.report`
WHERE pagepath LIKE '%28848290%'
AND date > TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -30 DAY))) as page_views_30_days
当我继续对我的page id
进行硬编码时,查询运行,但显然使用此查询为所有不同页面生成了相同的结果:
SELECT ins.NO_INSCRIPTION,
(SELECT (SELECT sum(pageviews)
FROM `centiva_ga_stats_page_views.report`
WHERE pagepath LIKE '%28848290%')) as pages_views_all_time,
(SELECT (SELECT sum(pageviews)
FROM `centiva_ga_stats_page_views.report`
WHERE pagepath LIKE '%28848290%' AND date > TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -30 DAY))) as page_views_30_days
FROM staging_remaxlist.Inscriptions ins
我尝试使用CONCAT('%',ins.NO_INSCRIPTION,'%')
替换我的硬编码通配符page id
,但出现此错误:
如果没有连接两侧的字段相等的条件,则不能使用LEFT OUTER JOIN。**
我有点理解为什么,但是我没有解决方案来使查询正常工作。
pagepath
字段包含我的page id
,但可以是任何内容(无标准格式)
感谢您的帮助!
这是我的数据库架构简化:
表staging_remaxlist。题字
Field Type
id INTEGER
no_inscription STRING
表centiva_ga_stats_page_views.report
Field Type
date TIMESTAMP
start_date TIMESTAMP
end_date TIMESTAMP
pagepath STRING
pageviews INTEGER
centiva_ga_stats_page_views.report.pagepath
的示例:
/en/house-for-sale-laurentides/350-rue-de-lucerne-ste-adele-17269832.rmx
/en/propertyview/12616898
/en/our-properties/gatineau-gatineau/181-rue-duquette-o/12078284
/fr/showproperty/18726771
/wp-content/plugins/hydrogene-wp/public/cache/16543327fr.html
/Properties/enhanceddetails/4e04ec20-M11699403?language=FR
任何可以包含staging_remaxlist.Inscriptions.NO_INSCRIPTION
字段的内容,该字段是7到8(我们应该期望它达到9)个字符的整数
答案 0 :(得分:1)
感谢更新的架构。听起来您真正想要的是页面的正则表达式匹配,然后使用该匹配值与您的题词表连接。下面的查询应该可以工作:
WITH total_pageviews AS (
SELECT
CAST(REGEXP_EXTRACT(pagepath, r"[0-9]+") AS INT64) AS pp
, SUM(pageviews) AS total_pageviews
FROM `centiva_ga_stats_page_views.report`
WHERE REGEXP_CONTAINS(pagepath, r"[0-9]+")
GROUP BY CAST(REGEXP_EXTRACT(pagepath, r"[0-9]+") AS INT64)
)
SELECT pp, total_pageviews, SUM(pageviews) AS page_views_30_days
FROM total_pageviews JOIN `staging_remaxlist.Inscriptions` ins
ON ga.pp = ins.NO_INSCRIPTION
WHERE date > TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -30 DAY)
GROUP BY pp, total_pageviews
上一个查询:
WITH page_views_all AS (
SELECT pagepath AS pp, sum(pageviews) AS page_views_all_time
FROM `centiva_ga_stats_page_views.report`
WHERE pagepath LIKE '%28848290%'
GROUP BY pagepath
)
SELECT pp, page_views_all_time, SUM(pageviews) AS page_views_30_days
FROM page_views_all pva
INNER JOIN `centiva_ga_stats_page_views.report` pv
ON pva.pp = pv.pagepath
WHERE pagepath LIKE '%28848290%'
AND date > TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -30 DAY)
GROUP BY pp, page_views_all_time