我正在尝试对存储在Redshift中的数据集执行以下查询:
SELECT v_users.user_id AS user_id,
v_users.first_name AS first_name,
v_users.email AS email,
COALESCE(v_users.country, accounts.region) AS country_code,
profiles.language AS language,
v_users.mobilenum AS mobile_num,
NULL as mobile_verification_date,
COALESCE(v_users.registration_date, accounts.date_created) AS activation_date,
EXISTS (SELECT 1
FROM cds.user_session_201612 AS users_session,
cds.access_logs_summary_201612 AS access_logs_summary,
views_legacy AS views_legacy
WHERE users_session.userid = v_users.user_id
OR access_logs_summary.userid = v_users.user_id
OR views_legacy.user_id = v_users.user_id) AS has_viewed,
NULL as preferred_genre_1,
NULL as preferred_genre_2,
NULL as preferred_genre_3
FROM users AS v_users,
users_metadata AS v_users_metadata,
account.account AS accounts,
account.profile AS profiles
WHERE accounts.id = v_users.user_id
AND profiles.id = v_users.user_id
AND v_users_metadata.user_id = v_users.user_id
我得到的问题如下:
ERROR: This type of correlated subquery pattern is not supported due to internal error
是由子查询引起的,但我该如何解决呢?你能给我一些建议吗?
答案 0 :(得分:0)
Redshift不允许SELECT子句中的相关子查询,我认为这不是一个限制,因为我遇到的所有示例都可以另外表达。
我已将子查询重构为CTE,并使用left join
和is not null
来标记有或没有查看某些内容的用户。
下面的此特定查询可能无效,但任何解决方案都可能采用以下形式:
WITH has_viewed AS (
SELECT
u.user_id
FROM users u
LEFT JOIN cds.user_session_201612 AS users_session
ON users_session.userid = u.user_id
LEFT JOIN cds.access_logs_summary_201612 AS access_logs_summary
ON access_logs_summary.userid = users.user_id
LEFT JOIN views_legacy
ON views_legacy.user_id = v_users.user_id
WHERE users_session.userid IS NOT NULL
OR access_logs_summary.userid IS NOT NULL
OR views_legacy.user_id
GROUP BY 1
)
SELECT
v_users.user_id AS user_id
, v_users.first_name AS first_name
, v_users.email AS email
, COALESCE(v_users.country, accounts.region) AS country_code
, profiles.language AS language
, v_users.mobilenum AS mobile_num
, NULL as mobile_verification_date
, COALESCE(v_users.registration_date, accounts.date_created) AS activation_date
, has_viewed.user_id IS NOT NULL AS has_viewed
, NULL as preferred_genre_1
, NULL as preferred_genre_2
, NULL as preferred_genre_3
FROM users AS v_users
JOIN users_metadata AS v_users_metadata
ON v_users_metadata.user_id = v_users.user_id
JOIN account.account AS accounts
ON accounts.id = v_users.user_id
JOIN account.profile AS profiles ON profiles.id = v_users.user_id
LEFT JOIN has_viewed
ON has_viewed.user_id = v_users.user_id
答案 1 :(得分:0)
我尝试了所有可能的组合,
SELECT
子查询不起作用CTE
(公共表格表达)也不起作用。现在我尝试了 - 我需要替代GROUP BY
,因为redshift不接受GROUP BY
。
所以我得到了这个解决方案 -
OVER
关键字。
作为GROUP BY
的替代,我使用了OVER
和PARTITION BY
,就像 -
SELECT *
FROM (
SELECT *,ROW_NUMBER()
OVER (PARTITION BY **VARIOUS COLUMNS** ORDER BY datetime DESC) rn
FROM schema.tableName
) derivedTable
WHERE derivedTable.rn = 1;
也许OVER
可能会帮助你。我不确定。