Question

我正在尝试获取某个变量列表的模式。当模式不唯一时，我想返回模式的平均值，以便获得模式的子查询（在更大的查询中）不返回两个值。但是，当模式是唯一的时，由于某种原因，平均查询返回缺失值。

我有以下示例数据：

data have;
input betprice;
datalines; 
    1.05
    1.05
    1.05
    6
    run;
    PROC PRINT; RUN;

proc sql;
select avg(betprice) 
    from
    (select betprice, count(*) as count_betprice from have group by betprice) 
    having count_betprice = max(count_betprice);
quit;

如果我在betprice字段中添加更多观察值以使模式不唯一，我会返回平均值。

data have;
input betprice;
datalines; 
    1.05
    1.05
    1.05
    6
    6
    6

run;
PROC PRINT; RUN;

如何更改此查询，以便始终返回两个最常用值的模式或平均值。

感谢您提供任何帮助。

Answer 1

你在SAS，为什么不让SAS计算统计数据，因为那是擅长的......

ods output modes=want;
proc univariate data=have modes;
var betprice;
run;
ods output close;

proc means data=want;
var mode;
output out=final(keep=betprice) mean=betprice;
run;

这不会花费太长时间，对于另一个程序员来说，你正在做的事情要清楚得多，并且很容易编码。如果你没有采用模式的平均值，你可以一步完成。

Answer 2

首先，请注意外部查询上没有group by语句，而使用having子句时。哪个不行。

这是一个有效的解决方案：

proc sql;
    create view WORK.V_BETPRICE_FREQ as
    select betprice, count(*) as count_betprice
    from HAVE
    group by betprice
    ;

    select avg(betprice) as final_betprice
    from WORK.V_BETPRICE_FREQ
    where count_betprice = (select max(count_betprice) from WORK.V_BETPRICE_FREQ)
    ;
quit;

我在这里使用了一个视图来防止代码重复。如果视图中的查询是CPU操作非常繁重的操作，则可能需要将其替换为物理表。

修改的作为反馈：我相信你在查询中遇到了困难，因为在你想要的外部查询中： 1.过滤后在所有记录中执行聚合功能 2.在过滤器中使用聚合函数如果没有group by语句，你不能在第二个语句中使用group by语句来执行第一个语句。

所以在最终结果中，我在外部查询中保留第一个，同时在另一个子查询中执行第二个。

Answer 3

这非常困难，在使用SAS工作了12年之后，我不记得在没有GROUP BY的情况下使用HAVING，我想它会产生意想不到的结果。

因此，对于单个查询，我的解决方案不是很好，因为它进行了两次分组。

单个查询版本：

proc sql;
select avg(betprice) 
    from ( select
                  betprice
                , count(*) as count_betprice
                from work.have
                group by betprice) /* first summary */
    where count_betprice
                = select max(count_betprice)
        from
          (select
                  betprice
                , count(*) as count_betprice
                from work.have
                group by betprice) /* same summary here */;
quit;

使用中间表（或视图，如果需要）而不是相同的子查询进行一些简化：

proc sql;
create table work.freq_sum
        as select
                betprice
                , count(*) as count_betprice
                from work.have
                group by betprice
;
select avg(betprice) 
    from work.freq_sum
    where count_betprice
                = select max(count_betprice) from work.freq_sum;
quit;

请注意，您可以通过PROC MEANS计算MODE和MEDIAN等统计数据：

proc means data=have n mean mode median;
var betprice;
run;

SAS：单次观察的AVERAGE（）

3 个答案: