我需要对两个数据集中的变量求和并加入它们。我想在一个SQL语句中执行此操作,但它是一对多连接。我有兴趣了解是否可以使用SELECT语句创建摘要变量,因为缺少更好的描述。
下面的代码错误地计算了HOURS的摘要变量,因为INTERVAL中每个名称/日期只有1条记录,但每条名称/日期在DETAIL中有多条记录。
我当然可以编写多个步骤来完成此任务,但想知道是否可以在一个SQL步骤中完成。感谢
示例代码:
data Detail;
Length Name CallType $25;
input date mmddyy10. name $ calltype $ count;
Format date mmddyy10.;
datalines;
05/01/2014 John Order 5
05/01/2014 John Complaint 6
05/01/2014 Mary Order 7
05/01/2014 Mary Complaint 8
05/01/2014 Joe Order 4
05/01/2014 Joe Complaint 2
05/01/2014 Joe Internal 2
05/02/2014 John Order 6
05/02/2014 John Complaint 4
05/02/2014 Mary Order 9
05/02/2014 Mary Complaint 7
05/02/2014 Joe Order 3
05/02/2014 Joe Complaint 1
05/02/2014 Joe Internal 3
;
data Interval;
Length Name $25;
input date mmddyy10. name $ hours;
Format date mmddyy10.;
datalines;
05/01/2014 John 8
05/01/2014 Mary 6
05/01/2014 Joe 4
05/02/2014 John 8
05/02/2014 Mary 6
05/02/2014 Joe 4
;
PROC SQL noprint feedback;
CREATE TABLE SUMMARY AS
SELECT
D.Name
, Sum(D.Count) as Count
, Sum(I.Hours) as Hours
FROM Detail D, Interval I
WHERE D.Name=I.Name and D.Date=I.Date
GROUP BY D.Name
ORDER BY D.Name;
QUIT;
答案 0 :(得分:2)
这有效,不应该效率太低。就个人而言,我认为最好的方法是在合并前独立总结两者:
PROC SQL noprint feedback;
CREATE TABLE SUMMARY AS
SELECT
D.Name
, Sum(D.Count) as Count
, (SELECT sum(I.Hours) as Hours from Interval I WHERE D.Name=I.Name GROUP BY i.name) as Hours
FROM Detail D
GROUP BY D.Name
ORDER BY D.Name
;
QUIT;
答案 1 :(得分:2)
罗伯特的解决方案工作正常,但是当将子查询移动到from子句而不是在select中使用它时,我的性能会更好。当在两个查询中使用时,只执行一次查询并结果连接,而select中的子查询将为每一行执行一次。
proc sql;
create table summary as
select
d.name,
count,
hours
from
(select name, sum(count) as count from detail group by name) d inner join
(select name, sum(hours) as hours from interval group by name) i
on d.name = i.name
order by d.name
;
quit;