SAS独特且独特 - 但也需要一个额外的变量

时间:2017-01-05 17:51:33

标签: sql sas

我有一个数据集,我正试图从SAS中获取频率。基本上有一堆剧集,是否有人有一个事件。一集可能在技术上有不止一个事件。每条记录还有一个组织标识符。我已经开发了一个代码(下面),可以正确识别每集事件的频率(例如,每集的1,2和3个事件):

angular.app('app', []).controller('MainController', MainController);

MainController.$inject = [$scope];

function MainController($scope){
    $scope.message = "Hello world";
}

这段代码非常适合给我一个每集事件数量的快速频率,但却缺少我需要的重要信息 - 即organization_id。每当我将organization_id添加到代码中时,频率就会变错。我试图将out表合并回table1以获得organization_id,但这也增加了频率。如何添加其他变量以便最终运行以下频率:

proc sql;
    create table out as select unique
        episode_id, sum(event) as total_event
    from table1
    group by episode_id;
quit; 

proc freq data=out;
    tables total_event;
run;

1 个答案:

答案 0 :(得分:0)

除非我误解了您的目标,否则我认为您需要额外合并数据本身,然后通过ORGANIZATION_ID进行重复数据删除以获得所需的交叉表:

data table1;
    length EPISODE_ID $3 EVENT 3 ORGANIZATION_ID $3;
    input EPISODE_ID $ EVENT ORGANIZATION_ID $;
    datalines;
    A   0   123
    A   1   123
    A   1   456
    B   0   123
    B   1   456
    C   1   456
    C   1   789
    C   1   789
    C   0   789
    D   0   123
    D   0   123
    D   0   123
    D   1   123
    D   1   123
    ;
run;

** sum EVENT over distinct EPISODE_ID **;
proc sql noprint;
    create table out1 as
    select a.*,b.TOTAL_EVENT
    from
        (select * from table1) a,
        (select distinct EPISODE_ID, sum(EVENT) as TOTAL_EVENT from table1 group by EPISODE_ID) b
        where a.EPISODE_ID = b.EPISODE_ID;
quit;

** dedupe by ORGANIZATION_ID **;
proc sort data = out1(keep = ORGANIZATION_ID TOTAL_EVENT) out=out2 nodupkey;
    by ORGANIZATION_ID;
run;

proc freq data=out2;
    tables ORGANIZATION_ID*TOTAL_EVENT;
run;

否则,您可以在PROC MEANS语句中同时使用EPISODE_IDORGANIZATION_ID CLASS。然后查看_TYPE_的不同级别,看看是否可以在其中一个上运行频率:

proc means data = table1 noprint;
    class EPISODE_ID ORGANIZATION_ID;
    var EVENT;
    output out = means sum=TOTAL_EVENT;
run;