Question

考虑以下测试数据集：

data test;
input Drug $ Quantity State $ Year;
datalines;
A 10 NY 2013
A 20 NY 2014
B 110 NY 2013
B 210 NY 2014   
A 50 OH 2013
A 60 OH 2014
B 150 OH 2013
B 260 OH 2014       
A 22 NY 2014
B 100 OH 2013
;
RUN;

以下代码总结了2013年药物和州药物A和B的数量：

    proc sql;
    create table testnew as
    select *, sum(Quantity) as total from test
    where Year=2013
    group by Drug,State;
    quit;

我有兴趣获得每种药物占每个州总量的比例。例如，在俄亥俄州，2013年共有300个药物A和B单位.A的比例为50/300，B的比例为250/300。

以下代码按州获得总药物：

  proc sql;
  create table testnew1 as
  select *, sum(Quantity) as total1 from test
  where Year=2013
  group by State;
  quit;

我想我可以合并 test 和 test1 并将 total 除以 total1 来获得比例。但有更简单的方法吗？

Answer 1

首先，在对SQL中的变量进行汇总时，应避免将＆＃34;组以外的输入变量包含在＆＃34;组中。 vars和决赛桌中的总结。这样可以防止重复行。

即使药物/州组合仅为4，您编写的第一个SQL也会输出5行。因此，不要选择*它更好地指定分组变量并在＆＃34;组中使用数字符号＆＃34;子句：

proc sql;
    create table testnew as
    select  State,
            Drug, 
            sum(Quantity) as total 
        from test
        where Year=2013
        group by 1, 2;
quit;

要使每种药物的比例相对于州总数，您可以使用子查询来计算总状态，而不是直接在外部查询中使用它：

proc sql;
    create table testnew1 as
    select  State,
            Drug, 
            sum(Quantity) as total,
            total_by_state,
            (calculated total) / total_by_state as proportion format=percent9.2
        from (select *, 
                    sum(Quantity) as total_by_state
                from test
                where Year=2013
                group by State)
        where Year=2013
        group by 1, 2;
quit;

如果需要，可以删除where子句，并在外部和内部查询中的group by中包含Year变量。

Group By Statement proc sql

1 个答案: