Redshift子查询不被接受

时间:2017-01-17 00:33:10

标签: sql amazon-redshift bigdata

我正在尝试对存储在Redshift中的数据集执行以下查询:

SELECT v_users.user_id AS user_id,
   v_users.first_name AS first_name,
   v_users.email AS email,
   COALESCE(v_users.country, accounts.region) AS country_code,
   profiles.language AS language,
   v_users.mobilenum AS mobile_num,
   NULL as mobile_verification_date,
   COALESCE(v_users.registration_date, accounts.date_created) AS activation_date,
   EXISTS (SELECT 1
             FROM cds.user_session_201612 AS users_session,
                  cds.access_logs_summary_201612 AS access_logs_summary,
                  views_legacy AS views_legacy
            WHERE users_session.userid = v_users.user_id
               OR access_logs_summary.userid = v_users.user_id
               OR views_legacy.user_id = v_users.user_id) AS has_viewed,
   NULL as preferred_genre_1,
   NULL as preferred_genre_2,
   NULL as preferred_genre_3
FROM users AS v_users,
     users_metadata AS v_users_metadata,
     account.account AS accounts,
     account.profile AS profiles
WHERE accounts.id = v_users.user_id
  AND profiles.id = v_users.user_id
  AND v_users_metadata.user_id = v_users.user_id

我得到的问题如下:

ERROR:  This type of correlated subquery pattern is not supported due to internal error

是由子查询引起的,但我该如何解决呢?你能给我一些建议吗?

2 个答案:

答案 0 :(得分:0)

Redshift不允许SELECT子句中的相关子查询,我认为这不是一个限制,因为我遇到的所有示例都可以另外表达。

我已将子查询重构为CTE,并使用left joinis not null来标记有或没有查看某些内容的用户。

下面的此特定查询可能无效,但任何解决方案都可能采用以下形式:

WITH has_viewed AS (
  SELECT 
      u.user_id
  FROM users u
  LEFT JOIN cds.user_session_201612 AS users_session 
         ON users_session.userid = u.user_id
  LEFT JOIN cds.access_logs_summary_201612 AS access_logs_summary 
         ON access_logs_summary.userid = users.user_id
  LEFT JOIN views_legacy 
         ON views_legacy.user_id = v_users.user_id
  WHERE users_session.userid IS NOT NULL 
     OR access_logs_summary.userid IS NOT NULL 
     OR views_legacy.user_id
  GROUP BY 1
)
SELECT 
   v_users.user_id AS user_id
 , v_users.first_name AS first_name
 , v_users.email AS email
 , COALESCE(v_users.country, accounts.region) AS country_code
 , profiles.language AS language
 , v_users.mobilenum AS mobile_num
 , NULL as mobile_verification_date
 , COALESCE(v_users.registration_date, accounts.date_created) AS activation_date
 , has_viewed.user_id IS NOT NULL AS has_viewed
 , NULL as preferred_genre_1
 , NULL as preferred_genre_2
 , NULL as preferred_genre_3
FROM users AS v_users
JOIN users_metadata AS v_users_metadata 
  ON v_users_metadata.user_id = v_users.user_id
JOIN account.account AS accounts 
  ON accounts.id = v_users.user_id
JOIN account.profile AS profiles ON profiles.id = v_users.user_id
LEFT JOIN has_viewed 
       ON has_viewed.user_id = v_users.user_id

答案 1 :(得分:0)

我尝试了所有可能的组合,

  1. SELECT子查询不起作用
  2. Haleemur Ali所示的
  3. CTE(公共表格表达)也不起作用。
  4. 现在我尝试了 - 我需要替代GROUP BY,因为redshift不接受GROUP BY。 所以我得到了这个解决方案 -

    OVER关键字。

    作为GROUP BY的替代,我使用了OVERPARTITION BY,就像 -

    SELECT *
    FROM (
        SELECT *,ROW_NUMBER() 
        OVER (PARTITION BY **VARIOUS COLUMNS** ORDER BY datetime DESC) rn
        FROM schema.tableName
    ) derivedTable
    WHERE derivedTable.rn = 1;
    

    也许OVER可能会帮助你。我不确定。