我有一个数据集,我正试图从SAS中获取频率。基本上有一堆剧集,是否有人有一个事件。一集可能在技术上有不止一个事件。每条记录还有一个组织标识符。我已经开发了一个代码(下面),可以正确识别每集事件的频率(例如,每集的1,2和3个事件):
angular.app('app', []).controller('MainController', MainController);
MainController.$inject = [$scope];
function MainController($scope){
$scope.message = "Hello world";
}
这段代码非常适合给我一个每集事件数量的快速频率,但却缺少我需要的重要信息 - 即organization_id。每当我将organization_id添加到代码中时,频率就会变错。我试图将out表合并回table1以获得organization_id,但这也增加了频率。如何添加其他变量以便最终运行以下频率:
proc sql;
create table out as select unique
episode_id, sum(event) as total_event
from table1
group by episode_id;
quit;
proc freq data=out;
tables total_event;
run;
答案 0 :(得分:0)
除非我误解了您的目标,否则我认为您需要额外合并数据本身,然后通过ORGANIZATION_ID
进行重复数据删除以获得所需的交叉表:
data table1;
length EPISODE_ID $3 EVENT 3 ORGANIZATION_ID $3;
input EPISODE_ID $ EVENT ORGANIZATION_ID $;
datalines;
A 0 123
A 1 123
A 1 456
B 0 123
B 1 456
C 1 456
C 1 789
C 1 789
C 0 789
D 0 123
D 0 123
D 0 123
D 1 123
D 1 123
;
run;
** sum EVENT over distinct EPISODE_ID **;
proc sql noprint;
create table out1 as
select a.*,b.TOTAL_EVENT
from
(select * from table1) a,
(select distinct EPISODE_ID, sum(EVENT) as TOTAL_EVENT from table1 group by EPISODE_ID) b
where a.EPISODE_ID = b.EPISODE_ID;
quit;
** dedupe by ORGANIZATION_ID **;
proc sort data = out1(keep = ORGANIZATION_ID TOTAL_EVENT) out=out2 nodupkey;
by ORGANIZATION_ID;
run;
proc freq data=out2;
tables ORGANIZATION_ID*TOTAL_EVENT;
run;
否则,您可以在PROC MEANS
语句中同时使用EPISODE_ID
和ORGANIZATION_ID
CLASS
。然后查看_TYPE_
的不同级别,看看是否可以在其中一个上运行频率:
proc means data = table1 noprint;
class EPISODE_ID ORGANIZATION_ID;
var EVENT;
output out = means sum=TOTAL_EVENT;
run;