SAS Proc SQL中的别名和分组语句

时间:2017-04-24 20:10:10

标签: sql sas proc-sql

我在SAS中使用proc SQL,我的一个proc sql查询表现得非常奇怪:

我有一个大型数据集(大约1百万行),看起来像这样:

apple_key    profit    price    cost    months    date      
golden_d     0.03      12       4       3         01/12
golden_d     0.03      8        0       2         01/12
granny_s     0.05      15       5       5         02/12
red_d        0.04      13       0       1         01/12
golden_d     0.02      1        2       12         03/14

在这个数据集上,我运行以下查询:

%let picking_date = 01/12; /* I simplify here - this part of my code definitely works */

proc sql; 
    CREATE TABLE output AS 
    SELECT 
        (CASE apple_key
              WHEN "golden_d" THEN 1
              WHEN "granny_s" THEN 2
              WHEN "red_d"    THEN 3
        END) AS apple_id,
        apple_key AS apple_name,
        (CASE WHEN cost= 0 THEN 0 
            ELSE 1 
        END) AS cost_flag,
        (CASE 
            WHEN CEIL(months / 2) < 5 THEN CEIL(months / 2) 
            ELSE 5
        END) AS age, 
        "McDonalds" as farm, 
        sum(profit*price)/sum(price) as price_weighted_profit
    FROM input_table
    WHERE date = "&picking_date."d
        AND price > cost
        AND cost >= 0
        AND cost >= 0
    GROUP BY apple_id, apple_name, cost_flag, age, farm
    ; 
run; 

当我运行此时,我的GROUP BY语句不起作用。我收到一堆条目 对于单个组(其中apple_id,apple_name,cost_flag,age和farm都相同,但我的聚合不起作用)。

但是,当我单独运行GROUP BY时(如下所示)一切正常。我得到每个组的一个条目,其中有#34;价格加权利润&#34;:

proc sql; 
    CREATE TABLE output_tmp AS 
    SELECT 
        (CASE apple_key
              WHEN "golden_d" THEN 1
              WHEN "granny_s" THEN 2
              WHEN "red_d"    THEN 3
        END) AS apple_id,
        apple_key AS apple_name,
        (CASE WHEN cost= 0 THEN 0 
            ELSE 1 
        END) AS cost_flag,
        (CASE 
            WHEN CEIL(months / 2) < 5 THEN CEIL(months / 2) 
            ELSE 5
        END) AS age, 
        "McDonalds" as farm
    FROM input_table
    WHERE date = "&picking_date."d
        AND price > cost
        AND cost >= 0
        AND cost >= 0
   ;

    CREATE TABLE output AS
    SELECT 
        apple_id, 
        apple_name, 
        cost_flag, 
        age, 
        farm,
        sum(profit*price)/sum(price) as price_weighted_profit
    FROM output_tmp
    GROUP BY apple_id, apple_name, cost_flag, age, farm
    ;
quit;

为什么会这样?我该如何解决?这让我有点疯狂......感谢前面的帮助

3 个答案:

答案 0 :(得分:1)

它不起作用,因为group by不是将总和(利润*价格)/总和(价格)声明作为聚合函数。由于年龄,成本标准等别名,它不会这样做。

以下是正确的查询: -

 Proc sql;
    CREATE TABLE output AS 
     SELECT 
            apple_id, 
            apple_name, 
            cost_flag, 
            age, 
            farm, 
            sum(profit*price)/sum(price) as price_weighted_profit
        FROM
       (
        SELECT 
            (CASE apple_key
                  WHEN "golden_d" THEN 1
                  WHEN "granny_s" THEN 2
                  WHEN "red_d"    THEN 3
            END) AS apple_id,
            apple_key AS apple_name,
            (CASE WHEN cost= 0 THEN 0 
                ELSE 1 
            END) AS cost_flag,
            (CASE 
                WHEN CEIL(months / 2) < 5 THEN CEIL(months / 2) 
                ELSE 5
            END) AS age, 
            "McDonalds" as farm
        FROM input_table
        WHERE date = "&picking_date."d
            AND price > cost
            AND cost >= 0
            AND cost >= 0

        ) a
        GROUP BY apple_id, apple_name, cost_flag, age, farm;
        quit;

如果您有任何问题,请告诉我

答案 1 :(得分:0)

Thumb规则: - 每当您在select子句中使用任何聚合函数时,其余列应该是group by的一部分。在您发布的问题中,您发布的是总和(利润*价格)/总和(价格),但没有导致问题的组。

if(empty($fromName) or empty($fromEmail) or empty($subject) or empty($comments)) {
    echo 'You cannot submit the form with empty fields. Please correct the form and resubmit.';
    return false;
}
elseif($fieldDelete == "Delete this text!"){    
         echo "Delete the contents of the fourth field before submitting.";
         return false;
}
elseif  (($fromName == "Curtisvien") || ($fromName == "Thomastymn") || ($fromName == "RichardMark")) {
        echo "Failed. Please try again.";
        return false;   
}

else {
        $flgchk = mail ("$to", "$subject", "$message", "$headers");
        $imgfile = "images/NatMap logo2.gif";
        $handle = fopen($filename, "r");
        $imgbinary = fread(fopen($imgfile, "r"), filesize($imgfile));   
        echo '<img src="data:image/gif;base64,' . base64_encode($imgbinary) . '" width=427 height=72 />';
        echo "\n<br />\n<br />Thank You!  An e-mail has been sent to the National Map web team and they will get back to you in the next 24-48 hours.";
}`enter code here`

答案 2 :(得分:0)

我怀疑发生了什么remerging。 SAS proc sql接受这样的代码:

proc sql;
    select a.*, count(*)
    from a;

这并未总结数据。相反,它将总计数放在每一行上。换句话说,如果select中的密钥与group by完全匹配,则根据group by密钥计算聚合函数,但结果会被放到各行。其他数据库使用窗口函数的子集来完成此操作。

在你的情况下,重新出现并不明显。我认为存在关键混淆,因为您在select中使用与原始数据中相同的名称。我的建议是更改别名,使它们明确无误,并确保group by中的键明确无误。