在询问关于将proc单变量的修剪均值丢弃在表格中的问题之后:
SAS: PROC UNIVARIATE: Output trimmed mean to dataset
我想从一个单独的proc单变量输出一个修剪过的平均值。然而,ods输出似乎不适用于noprint,并且有太多的组ID可供使用。有没有回避这个问题?
proc univariate data = Table1 idout trim=1;
var DaysBtwPay;
by id;
trimmedmeans = trimMean2 (keep = id Mean stdMean);
run;
答案 0 :(得分:1)
除了编写自己的数据步骤来计算trimmed mean
之外,我无法想到这个问题。
这可以分两步完成。
Step-1:
在这一步中,我们想知道每个by-group
中有多少观察值和测量变量的简单平均值。在下一步中,当计算trimmed mean
不可行时,将返回简单平均值。例如:如果您要排除极限5个障碍,但只有by-group
中的7个观测值,则proc univarite
会返回缺失值。
请注意order by
子句 - 此排序用于排除极端障碍。
proc sql;
create table inputForTmeans as
select
a.region /*your by-group var*/
,a.returns /*your measurement variable of interest*/
,b.count
,b.simpleAvgReturns
from sashelp.shoes as a
inner join (select region, count(*) as count, mean(returns) as simpleAvgReturns
from sashelp.shoes
group by region) as b
on a.region = b.region
order by
a.region
,a.returns;
quit;
Step-2:
%let trimmed = 1; /*no. of extreme obs to exclude from mean calculation*/
data trimmedMean;
set inputForTmeans;
row_count+1; /*counter variable to number each obs in a by-group*/
by region returns;
if first.region then do;
row_count=1;
returnsSum=.;
end;
if &trimmed.<row_count <=(count - &trimmed.) then returnsSum+returns;
/***************************************************************************/
if last.region then do;
trimmedMeanreturns = coalesce(returnsSum/(count - 2*&trimmed.), simpleAvgReturns) ;
N = row_count;
trimmedRowCount = 2*&trimmed.;
output;
end;
keep region trimmedMeanreturns N count trimmedRowCount ;
/***************************************************************************/
run;
输出: %let trimmed = 1;
region DataStep ProcUnivariate
Africa 1183.962963 1183.963
Asia 662 662
Canada 3089.1428571 3089.143
Central America/Caribbean 3561.1333333 3561.133
Eastern Europe 2665.137931 2665.138
Middle East 6794.3181818 6794.318
Pacific 1538.5813953 1538.581
South America 1824.1153846 1824.115
United States 4462.4210526 4462.421
Western Europe 2538.05 2538.05
%let trimmed = 14;
region DataStep ProcUnivariate
Africa 897.92857143 897.9
Asia 778.21428571 .
Canada 1098.1111111 1098.1
Central America/Caribbean 2289.25 2289.3
Eastern Europe 2559.6666667 2559.7
Middle East 8620 .
Pacific 895.88235294 895.9
South America 1538.5769231 1538.6
United States 4010.4166667 4010.4
Western Europe 1968.5882353 1968.6
datastep
trimmed=1
的输出:
Count
:by-group
N
:忽略此列 - 与Count
trimmedRowCount
:没有。排除了极端行。如果trimmedRowCount
= Count
,那么trimmedMeanreturns
就是SimpleAverage
Region count trimmedMeanreturns N trimmedRowCount
Africa 56 1183.963 56 2
Asia 14 662 14 2
Canada 37 3089.143 37 2
Central America/Caribbean 32 3561.133 32 2
Eastern Europe 31 2665.138 31 2
Middle East 24 6794.318 24 2
Pacific 45 1538.581 45 2
South America 54 1824.115 54 2
United States 40 4462.421 40 2
Western Europe 62 2538.05 62 2
答案 1 :(得分:1)
Doh!,您似乎可以使用ods _all_ close;
选项来禁止HTML输出,而不是在编写自己的datastep
例程时遇到麻烦。
%let trimmed = 1;
proc sort data=sashelp.shoes out=have;
by region;
run;
ods _all_ close;
PROC UNIVARIATE DATA=have trimmed=&trimmed. ;
VAR returns;
by region;
ods output TrimmedMeans=trimmedMeansUni ;
run;