PROC UNIVARIATE:通过id将输出修剪为数据集

时间:2014-02-28 10:43:04

标签: sas

在询问关于将proc单变量的修剪均值丢弃在表格中的问题之后:
SAS: PROC UNIVARIATE: Output trimmed mean to dataset

我想从一个单独的proc单变量输出一个修剪过的平均值。然而,ods输出似乎不适用于noprint,并且有太多的组ID可供使用。有没有回避这个问题?

proc univariate data = Table1  idout trim=1;
var DaysBtwPay;
by id;
trimmedmeans = trimMean2 (keep = id Mean stdMean);
run;

2 个答案:

答案 0 :(得分:1)

除了编写自己的数据步骤来计算trimmed mean之外,我无法想到这个问题。 这可以分两步完成。

Step-1:

在这一步中,我们想知道每个by-group中有多少观察值和测量变量的简单平均值。在下一步中,当计算trimmed mean不可行时,将返回简单平均值。例如:如果您要排除极限5个障碍,但只有by-group中的7个观测值,则proc univarite会返回缺失值。 请注意order by子句 - 此排序用于排除极端障碍。

proc sql;
create table inputForTmeans as 
select
a.region /*your by-group var*/
,a.returns /*your measurement variable of interest*/
,b.count
,b.simpleAvgReturns
from sashelp.shoes as a
inner join (select region, count(*) as count, mean(returns) as simpleAvgReturns
                   from sashelp.shoes 
                    group by region) as b
on a.region = b.region
order by 
a.region
,a.returns;
quit;

Step-2:

%let trimmed = 1; /*no. of extreme obs to exclude from mean calculation*/
data trimmedMean;
set inputForTmeans;
row_count+1; /*counter variable to number each obs in a by-group*/
by region returns;
if first.region then do;
                row_count=1;
                returnsSum=.;
                end;
if &trimmed.<row_count <=(count - &trimmed.) then returnsSum+returns;
/***************************************************************************/
if last.region then do;
        trimmedMeanreturns = coalesce(returnsSum/(count - 2*&trimmed.), simpleAvgReturns) ;
        N = row_count;
        trimmedRowCount = 2*&trimmed.;
        output;
    end;
keep region trimmedMeanreturns N count trimmedRowCount ;
/***************************************************************************/
run;

输出: %let trimmed = 1;

region                     DataStep      ProcUnivariate
Africa                     1183.962963   1183.963
Asia                       662           662
Canada                     3089.1428571  3089.143
Central America/Caribbean  3561.1333333  3561.133
Eastern Europe             2665.137931   2665.138
Middle East                6794.3181818  6794.318
Pacific                    1538.5813953  1538.581
South America              1824.1153846  1824.115
United States              4462.4210526  4462.421
Western Europe             2538.05       2538.05

%let trimmed = 14;

region                    DataStep     ProcUnivariate
Africa                    897.92857143 897.9
Asia                      778.21428571 .
Canada                    1098.1111111 1098.1
Central America/Caribbean 2289.25      2289.3
Eastern Europe            2559.6666667 2559.7
Middle East               8620         .
Pacific                   895.88235294 895.9
South America             1538.5769231 1538.6
United States             4010.4166667 4010.4
Western Europe           1968.5882353  1968.6

datastep trimmed=1的输出:

Countby-group

中的行数

N:忽略此列 - 与Count

相同

trimmedRowCount:没有。排除了极端行。如果trimmedRowCount = Count,那么trimmedMeanreturns就是SimpleAverage

Region                    count trimmedMeanreturns N trimmedRowCount
Africa                    56    1183.963           56     2
Asia                      14    662                14     2
Canada                    37    3089.143           37     2
Central America/Caribbean 32    3561.133           32     2
Eastern Europe            31    2665.138           31     2
Middle East               24    6794.318           24     2
Pacific                   45    1538.581           45     2
South America             54    1824.115           54     2
United States             40    4462.421           40     2
Western Europe            62    2538.05            62     2

答案 1 :(得分:1)

Doh!,您似乎可以使用ods _all_ close;选项来禁止HTML输出,而不是在编写自己的datastep例程时遇到麻烦。

%let trimmed = 1; 
proc sort data=sashelp.shoes out=have;
by region;
run;
ods _all_ close;
PROC UNIVARIATE DATA=have trimmed=&trimmed. ;
VAR returns;
by region;
ods output TrimmedMeans=trimmedMeansUni  ;
run;