我习惯了BigQuery,可以在其中使用'WITH'子句运行临时表,然后将这些临时表与最终查询结合在一起。但是,我现在通过DataGrip使用Hive数据库,在该数据库中我无法在一个查询执行中运行顺序临时表。相反,我必须突出显示每个临时表块(在一个脚本内),然后执行到下一个,然后执行到下一个,...这很烦人。
有两项帮助:
有人知道我如何运行顺序临时表,然后最后通过最终的SELECT语句将它们全部联接起来吗?
此外,我发现临时表存储在我的会话中,我需要用一条简单的代码行将它们删除,这在BigQuery中不是必需的(再次令人讨厌)。谁能帮我解决这个问题?因为有时临时表可以更改列名,而我不想担心删除带有旧列名的以前的临时表。
以下是代码示例:
-- audience temp table
CREATE TEMPORARY VIEW aud AS (
1 SELECT
exp_luid
FROM audience_manager.segments5_luid
WHERE segment_version_id IN (627, 629)
)
-- KVJ table
CREATE TEMPORARY VIEW prod AS (
SELECT
station_callsign,
exp_luid,
ds,
ad_start_ts_utc as ad_time,
COUNT(ds) AS impressions
FROM vizio_production.kantar_vizio_v4_new
WHERE product_id = 36325675
AND ds BETWEEN 20190101 AND 20190430
AND exp_luid IS NOT NULL
GROUP BY 1,2,3,4
)
-- Join KVJ and audience data set
CREATE TEMPORARY VIEW join_one AS (
SELECT
aud.exp_luid AS exp_luid,
prod.station_callsign AS network,
prod.ds AS ds,
prod.ad_time AS ad_time,
SUM(prod.impressions) AS impressions
FROM aud
INNER JOIN prod ON aud.exp_luid = prod.exp_luid
GROUP BY 1,2,3,4
)
SELECT * FROM join_one
从选择语句'join_one'进行最终联接,而无需缓存临时表并在一次脚本执行中运行整个SQL脚本。
答案 0 :(得分:0)
Hive文档使我相信这会起作用:
WITH aud AS (
SELECT
exp_luid
FROM audience_manager.segments5_luid
WHERE segment_version_id IN (627, 629)
),
prod AS (
SELECT
station_callsign,
exp_luid,
ds,
ad_start_ts_utc as ad_time,
COUNT(ds) AS impressions
FROM vizio_production.kantar_vizio_v4_new
WHERE product_id = 36325675
AND ds BETWEEN 20190101 AND 20190430
AND exp_luid IS NOT NULL
GROUP BY 1,2,3,4
),
join_one AS (
SELECT
aud.exp_luid AS exp_luid,
prod.station_callsign AS network,
prod.ds AS ds,
prod.ad_time AS ad_time,
SUM(prod.impressions) AS impressions
FROM aud
INNER JOIN prod ON aud.exp_luid = prod.exp_luid
GROUP BY 1,2,3,4
)
SELECT * FROM join_one
我不太清楚为什么由于单个查询相对紧凑而需要将其扩展为CTE:
SELECT
aud.exp_luid AS exp_luid,
prod.station_callsign AS network,
prod.ds AS ds,
prod.ad_time AS ad_time,
SUM(prod.impressions) AS impressions
FROM
audience_manager.segments5_luid aud
INNER JOIN
(
SELECT
station_callsign,
exp_luid,
ds,
ad_start_ts_utc as ad_time,
COUNT(ds) AS impressions
FROM vizio_production.kantar_vizio_v4_new
WHERE product_id = 36325675
AND ds BETWEEN 20190101 AND 20190430
AND exp_luid IS NOT NULL
GROUP BY 1,2,3,4
) prod
ON aud.exp_luid = prod.exp_luid
WHERE aud.segment_version_id IN (627, 629)
GROUP BY 1,2,3,4