我在Qubole中有一个代码,要花将近3个小时来执行。我正在寻找一些建议来减少代码执行时间。
WITH
-- Get latest date - 10 days before as day
d
AS (
SELECT CAST(CONCAT (
SUBSTR(CAST(DATE_ADD('day', - 10, CAST(CURRENT_TIMESTAMP AS DATE)) AS VARCHAR), 1, 4),
SUBSTR(CAST(DATE_ADD('day', - 10, CAST(CURRENT_TIMESTAMP AS DATE)) AS VARCHAR), 6, 2),
SUBSTR(CAST(DATE_ADD('day', - 10, CAST(CURRENT_TIMESTAMP AS DATE)) AS VARCHAR), 9, 2)
) AS BIGINT) AS day,
CAST(CONCAT (
SUBSTR(CAST(DATE_ADD('day', - 10, CAST(CURRENT_TIMESTAMP AS DATE)) AS VARCHAR), 1, 4),
'-',
SUBSTR(CAST(DATE_ADD('day', - 10, CAST(CURRENT_TIMESTAMP AS DATE)) AS VARCHAR), 6, 2),
'-',
SUBSTR(CAST(DATE_ADD('day', - 10, CAST(CURRENT_TIMESTAMP AS DATE)) AS VARCHAR), 9, 2)
) AS DATE) AS DATE,
'FR' AS country
)
SELECT 'Streaming' AS TRANSACTION,
'Spotify' AS account,
p_day,
access,
COUNT(DISTINCT customer_id) AS users,
COUNT(*) AS units
FROM temp_1
WHERE day >= (
SELECT day
FROM d
)
AND country_code = (
SELECT country
FROM d
)
GROUP BY 1,
2,
3,
4
UNION ALL
SELECT 'Streaming' AS TRANSACTION,
'Deezer' AS account,
p_day,
CASE
WHEN offer_code IN ('APP', 'BAO', 'BDP', 'BDS', 'BMO', 'BMS', 'BMW', 'BPF', 'BPP', 'BPR', 'BSO', 'BWE', 'BWP', 'BWS', 'DEE', 'DEP', 'ETT', 'EXT', 'FFX', 'IOS', 'OT1', 'PBH', 'PE1', 'PE2', 'PEM', 'PLS', 'PRM', 'PSC', 'PTP', 'SDP', 'SMG', 'SPF', 'SPP', 'SPR', 'SUP', 'SWE', 'SWP', '3M', 'FAM', 'GOO', 'GOF', 'HFP', 'HFF', 'HFI')
THEN 'premium'
WHEN offer_code IN ('BFR', 'MFS', 'MOD', 'SMR')
THEN 'free'
ELSE NULL
END AS access,
COUNT(DISTINCT masked_consumer_id) AS users,
SUM(units_sold_streams) AS streams
FROM temp_2
WHERE day >= (
SELECT day
FROM d
)
AND country_code = (
SELECT country
FROM d
)
GROUP BY 1,
2,
3,
4
UNION ALL
SELECT 'Streaming' AS TRANSACTION,
'Apple Music' AS account,
ingest_datestamp AS p_day,
'premium' AS access,
COUNT(DISTINCT anonymized_person_id) AS users,
COUNT(*) AS streams
FROM temp_streams1
WHERE ingest_datestamp >= (
SELECT DATE
FROM d
)
AND country_code = (
SELECT country
FROM d
)
GROUP BY 1,
2,
3,
4
答案 0 :(得分:0)
这对优化查询性能没有太大帮助,但将有助于简化代码。日期计算可以简化(在Presto上测试)
cast(DATE_FORMAT(DATE_ADD('day', -10, CURRENT_DATE),'%Y%m%d') as bigint) as day,
DATE_ADD('day', -10, CURRENT_DATE) as date
为了提高性能,我建议您按日期对表进行分区,并根据国家/地区代码上的数据大小,还应将通过日期计算的参数(而不是子查询)作为参数传递,以确保分区修剪有效。