在语句中选择Null

时间:2015-01-19 10:56:52

标签: sql postgresql select null common-table-expression

我在SQLWorkbenchJ中使用PostgreSQL而且我在苦苦挣扎。

我有一个WITH语句,根据行号选择日期。如果语句找不到行号,我想在日期字段中选择null。这当前不会发生,它只选择所有字段都不为空的记录。我假设它与联接有关,但我不确定。

目前的声明如下。它应该返回大约50,000条记录,但它目前返回到2000以下。

WITH FifthEnquiry AS
(
  SELECT emailaddress,
         SentDate,
         ROW_NUMBER() OVER (PARTITION BY emailaddress ORDER BY COUNT(*) DESC) AS rk
  FROM SentEmails
  GROUP BY emailaddress,
           SentDate
),
TenthEnquiry AS
(
  SELECT emailaddress,
         SentDate,
         ROW_NUMBER() OVER (PARTITION BY emailaddress ORDER BY COUNT(*) DESC) AS rk
  FROM SentEmails
  GROUP BY emailaddress,
           SentDate
),
TwentiethEnquiry AS
(
  SELECT emailaddress,
         SentDate,
         ROW_NUMBER() OVER (PARTITION BY emailaddress ORDER BY COUNT(*) DESC) AS rk
  FROM SentEmails
  GROUP BY emailaddress,
           SentDate
)
SELECT FifthEnquiry.emailaddress,
       FifthEnquiry.SentDate AS Fith,
       TenthEnquiry.SentDate AS Tenth,
       TwentiethEnquiry.SentDate AS Twentieth,
FROM FifthEnquiry
  JOIN TenthEnquiry ON FifthEnquiry.emailaddress = TenthEnquiry.emailaddress
  JOIN TwentiethEnquiry ON FifthEnquiry.emailaddress = TwentiethEnquiry.emailaddress
WHERE (FifthEnquiry.rk = 5)
AND   (TenthEnquiry.rk = 10)
AND   (TwentiethEnquiry.rk = 20)

1 个答案:

答案 0 :(得分:3)

你可以大大简化。并使用LEFT JOIN保留GROUP BY后至少5行的所有电子邮件地址,即使没有第10行或第20行:

WITH cte AS (
   SELECT emailaddress, SentDate,
          ROW_NUMBER() OVER (PARTITION BY emailaddress
                             ORDER BY COUNT(*) DESC, SentDate) AS rn
   FROM   SentEmails
   GROUP  BY 1,2
   )
SELECT enq05.emailaddress,
       enq05.SentDate AS fifth,
       enq10.SentDate AS tenth,
       enq20.SentDate AS twentieth
FROM        cte AS enq05
LEFT   JOIN cte AS enq10 ON enq10.emailaddress = enq05.emailaddress
                        AND enq10.rn = 10
LEFT   JOIN cte AS enq20 ON enq20.emailaddress = enq05.emailaddress
                        AND enq20.rn = 20
WHERE  enq05.rn = 5;
  • 您不需要单独的CTE,三者都在做同样的事情。 一个CTE 就足够了,显然更快。改为在外部查询中使用具有不同表别名的自联接。

  • 由于我们现在正在使用 LEFT JOIN ,因此我们是否在JOIN或WHERE子句中添加了其他条件。 WHERE子句中的条件有效地强制Postgres将连接视为普通[INNER] JOIN。我相应地将条件移动到JOIN子句。详细说明:

  • 使用rn,而不是rk作为列别名。这是一个“行号”,而不是“排名”。请注意row_number() and rank()之间行为中的重要差异

  • SentDate添加到 ORDER BY 作为(emailaddress, SentDate)的决胜局,使用相同的计数来获得稳定的排序顺序。我拥有它的方式SentDate IS NULL最后是每组。您可能希望使用NULLS LAST来降序排序(不适用于COUNT(*),它永远不会为NULL):

  • 您需要注意的另一个细微的细节:tenthtwentieth两个不同的原因的结果中都可以为NULL,如果{{ 1}}在底层表中可以为NULL。结果中SentDate的NULL值可能意味着tenth的值不到10个,或者它可能意味着NULL根据您的排序顺序位于第10个位置。