使用Proc SQL创建摘要变量

时间:2014-06-10 15:29:23

标签: sas proc-sql

我需要对两个数据集中的变量求和并加入它们。我想在一个SQL语句中执行此操作,但它是一对多连接。我有兴趣了解是否可以使用SELECT语句创建摘要变量,因为缺少更好的描述。

下面的代码错误地计算了HOURS的摘要变量,因为INTERVAL中每个名称/日期只有1条记录,但每条名称/日期在DETAIL中有多条记录。

我当然可以编写多个步骤来完成此任务,但想知道是否可以在一个SQL步骤中完成。感谢

示例代码:

data Detail;
 Length Name CallType $25;
 input date mmddyy10. name $ calltype $ count;
 Format date mmddyy10.;
 datalines;
05/01/2014 John Order 5
05/01/2014 John Complaint 6
05/01/2014 Mary Order 7
05/01/2014 Mary Complaint 8
05/01/2014 Joe Order 4
05/01/2014 Joe Complaint 2
05/01/2014 Joe Internal 2
05/02/2014 John Order 6
05/02/2014 John Complaint 4
05/02/2014 Mary Order 9
05/02/2014 Mary Complaint 7
05/02/2014 Joe Order 3
05/02/2014 Joe Complaint 1
05/02/2014 Joe Internal 3
;

data Interval;
 Length Name $25;
 input date mmddyy10. name $ hours;
 Format date mmddyy10.;
 datalines;
05/01/2014 John 8
05/01/2014 Mary 6
05/01/2014 Joe 4
05/02/2014 John 8
05/02/2014 Mary 6
05/02/2014 Joe 4
;

PROC SQL noprint feedback;
 CREATE TABLE SUMMARY AS
 SELECT
  D.Name
  , Sum(D.Count) as Count
  , Sum(I.Hours) as Hours
 FROM Detail D, Interval I
 WHERE D.Name=I.Name and D.Date=I.Date
 GROUP BY D.Name
 ORDER BY D.Name;
QUIT;

2 个答案:

答案 0 :(得分:2)

这有效,不应该效率太低。就个人而言,我认为最好的方法是在合并前独立总结两者:

PROC SQL noprint feedback;
 CREATE TABLE SUMMARY AS
 SELECT
  D.Name
  , Sum(D.Count) as Count
  , (SELECT sum(I.Hours) as Hours from Interval I WHERE D.Name=I.Name GROUP BY i.name) as Hours
 FROM Detail D
 GROUP BY D.Name
 ORDER BY D.Name
 ;
QUIT;

答案 1 :(得分:2)

罗伯特的解决方案工作正常,但是当将子查询移动到from子句而不是在select中使用它时,我的性能会更好。当在两个查询中使用时,只执行一次查询并结果连接,而select中的子查询将为每一行执行一次。

    proc sql;
     create table summary as
     select
      d.name,
      count,
      hours
     from
      (select name, sum(count) as count from detail group by name) d inner join 
      (select name, sum(hours) as hours from interval group by name) i
      on d.name = i.name
     order by d.name
    ;
    quit;