我在SAS中使用proc SQL,我的一个proc sql查询表现得非常奇怪:
我有一个大型数据集(大约1百万行),看起来像这样:
apple_key profit price cost months date
golden_d 0.03 12 4 3 01/12
golden_d 0.03 8 0 2 01/12
granny_s 0.05 15 5 5 02/12
red_d 0.04 13 0 1 01/12
golden_d 0.02 1 2 12 03/14
在这个数据集上,我运行以下查询:
%let picking_date = 01/12; /* I simplify here - this part of my code definitely works */
proc sql;
CREATE TABLE output AS
SELECT
(CASE apple_key
WHEN "golden_d" THEN 1
WHEN "granny_s" THEN 2
WHEN "red_d" THEN 3
END) AS apple_id,
apple_key AS apple_name,
(CASE WHEN cost= 0 THEN 0
ELSE 1
END) AS cost_flag,
(CASE
WHEN CEIL(months / 2) < 5 THEN CEIL(months / 2)
ELSE 5
END) AS age,
"McDonalds" as farm,
sum(profit*price)/sum(price) as price_weighted_profit
FROM input_table
WHERE date = "&picking_date."d
AND price > cost
AND cost >= 0
AND cost >= 0
GROUP BY apple_id, apple_name, cost_flag, age, farm
;
run;
当我运行此时,我的GROUP BY
语句不起作用。我收到一堆条目
对于单个组(其中apple_id,apple_name,cost_flag,age和farm都相同,但我的聚合不起作用)。
但是,当我单独运行GROUP BY时(如下所示)一切正常。我得到每个组的一个条目,其中有#34;价格加权利润&#34;:
proc sql;
CREATE TABLE output_tmp AS
SELECT
(CASE apple_key
WHEN "golden_d" THEN 1
WHEN "granny_s" THEN 2
WHEN "red_d" THEN 3
END) AS apple_id,
apple_key AS apple_name,
(CASE WHEN cost= 0 THEN 0
ELSE 1
END) AS cost_flag,
(CASE
WHEN CEIL(months / 2) < 5 THEN CEIL(months / 2)
ELSE 5
END) AS age,
"McDonalds" as farm
FROM input_table
WHERE date = "&picking_date."d
AND price > cost
AND cost >= 0
AND cost >= 0
;
CREATE TABLE output AS
SELECT
apple_id,
apple_name,
cost_flag,
age,
farm,
sum(profit*price)/sum(price) as price_weighted_profit
FROM output_tmp
GROUP BY apple_id, apple_name, cost_flag, age, farm
;
quit;
为什么会这样?我该如何解决?这让我有点疯狂......感谢前面的帮助
答案 0 :(得分:1)
它不起作用,因为group by不是将总和(利润*价格)/总和(价格)声明作为聚合函数。由于年龄,成本标准等别名,它不会这样做。
以下是正确的查询: -
Proc sql;
CREATE TABLE output AS
SELECT
apple_id,
apple_name,
cost_flag,
age,
farm,
sum(profit*price)/sum(price) as price_weighted_profit
FROM
(
SELECT
(CASE apple_key
WHEN "golden_d" THEN 1
WHEN "granny_s" THEN 2
WHEN "red_d" THEN 3
END) AS apple_id,
apple_key AS apple_name,
(CASE WHEN cost= 0 THEN 0
ELSE 1
END) AS cost_flag,
(CASE
WHEN CEIL(months / 2) < 5 THEN CEIL(months / 2)
ELSE 5
END) AS age,
"McDonalds" as farm
FROM input_table
WHERE date = "&picking_date."d
AND price > cost
AND cost >= 0
AND cost >= 0
) a
GROUP BY apple_id, apple_name, cost_flag, age, farm;
quit;
如果您有任何问题,请告诉我
答案 1 :(得分:0)
Thumb规则: - 每当您在select子句中使用任何聚合函数时,其余列应该是group by的一部分。在您发布的问题中,您发布的是总和(利润*价格)/总和(价格),但没有导致问题的组。
if(empty($fromName) or empty($fromEmail) or empty($subject) or empty($comments)) {
echo 'You cannot submit the form with empty fields. Please correct the form and resubmit.';
return false;
}
elseif($fieldDelete == "Delete this text!"){
echo "Delete the contents of the fourth field before submitting.";
return false;
}
elseif (($fromName == "Curtisvien") || ($fromName == "Thomastymn") || ($fromName == "RichardMark")) {
echo "Failed. Please try again.";
return false;
}
else {
$flgchk = mail ("$to", "$subject", "$message", "$headers");
$imgfile = "images/NatMap logo2.gif";
$handle = fopen($filename, "r");
$imgbinary = fread(fopen($imgfile, "r"), filesize($imgfile));
echo '<img src="data:image/gif;base64,' . base64_encode($imgbinary) . '" width=427 height=72 />';
echo "\n<br />\n<br />Thank You! An e-mail has been sent to the National Map web team and they will get back to you in the next 24-48 hours.";
}`enter code here`
答案 2 :(得分:0)
我怀疑发生了什么remerging。 SAS proc sql接受这样的代码:
proc sql;
select a.*, count(*)
from a;
这并未总结数据。相反,它将总计数放在每一行上。换句话说,如果select
中的密钥与group by
完全匹配,则根据group by
密钥计算聚合函数,但结果会被放到各行。其他数据库使用窗口函数的子集来完成此操作。
在你的情况下,重新出现并不明显。我认为存在关键混淆,因为您在select
中使用与原始数据中相同的名称。我的建议是更改别名,使它们明确无误,并确保group by
中的键明确无误。