Question

我需要将SAS代码（PROC SQL）转换为（postgres）SQL，尤其是SAS中的calculated关键字，它允许查询中定义的变量直接在同一查询中重复使用另一个变量计算：

SELECT 
     id,
     sum( case
         when (sales > 0) then 1
         when (sales = 0) then 0
         else -1 
     end) as pre_freq,
     (case 
         when calculated pre_freq > 0 then calculated pre_freq 
         else 1 
     end) as freq
FROM my_table
GROUP BY id

这在SQL中是不可能的（AFAIK），所以我需要分解计算的每一步。

我想知道什么是最好的选择，知道根据我的理解，最好是进行更多的计算和更少的表扫描，即在扫描期间尽可能多地进行计算，而不是使用小计算进行多次表扫描步骤。

在这个特殊的例子中，我可以使用：

SELECT 
       id
     , greatest(1, sum( case
         when (sales > 0) then 1
         when (sales = 0) then 0
         else -1 
     end) as freq
FROM 
     my_table
GROUP BY id

或：

SELECT 
       id
       , (case when sum(case
                when (sales > 0) then 1
                when (sales < 0) then -1 
                else 0
        end) > 0 then sum(case
                when (sales  > 0) then 1
                when (sales  < 0) then -1 
                else 0
        end) else 1 end) as freq
FROM 
     my_table
GROUP BY id

...开始难以阅读...

无论如何要为将要重复的SQL代码片段定义一个变量吗？
更一般地说，这个例子，是最好的（最有效的）方法吗？

Answer 1

calculated是proc sql的一个不错的功能。但是，您不能在数据库中重复使用别名（这不是Postgres特定的限制）。一种简单的方法是使用子查询或CTE：

select id, pre_freq,
       (case when pre_freq > 0 then pre_freq 
             else 1 
        end) as freq
from (select id,
             sum(case when (sales > 0) then 1
                      when (sales = 0) then 0
                      else -1 
                 end) as pre_freq,
      from my_table t
      group by id
     ) t;

但是，最简单的解决方案是使用sign()：

select id, sum(sign(sales)) as pre_freq,
       greatest(sum(sign(sales)), 1) as freq
from my_table t
group by id;

注意：这略有不同。它基本上忽略了NULL个值。如果您确实需要将NULL视为-1，请使用coalesce()。

在1个表扫描中对多个列操作进行分组

1 个答案: