在Subquery SAS中分组

时间:2015-10-10 19:50:19

标签: sql group-by sas subquery

嗨我有一个像这样的数据集

<div id="js-college-selector" class="col-md-4">
    {{ college_form.as_p }}
</div>

<div id="js-college-data" class="hidden">
    {# HTML form to fill in college info and possibly unhide #}
</div>

<script type="text/javascript">
    // Assuming that you are using jquery
    // You would eventually want to move this to a javascript file
    $("#js-college-selector").change(function() {
        // jQuery
        var selectedText = $(this).find(':selected').text();
        $.ajax({
            type: "GET", //rest Type
            dataType: 'jsonp', //mispelled
            url: "path/to/college/info/endpoint",
            async: false,
            contentType: "application/json; charset=utf-8",
            success: function (data, textStatus, jqXHR) {
                // data is the college info to be used to modify the page with
                console.log(data);
            }
        });
    });
}
</script>

我希望获得每个类别中每个品牌的市场份额。比如,类别1中A的市场份额是3/6 = 50%。

我使用了sql代码

Brand   Category
----------------------
A       1
A       1
A       1
B       1
B       1
C       1
A       2
C       2
C       2
C       2

但SAS报告错误

    proc sql;
    select
    Brand, 
    count(brand) / (select count(category) from dataset group by category) as percent
    from dataset
    group by brand, category;

请帮忙。非常感谢你!

3 个答案:

答案 0 :(得分:1)

您需要将类别总计数合并到品牌*类别组合中。如果需要,PROC SQL会自动为您执行此操作。

data have ;
  input Brand $ Category $ @@;
cards;
A 1 A 1 A 1 B 1 B 1 C 1 A 2 C 2 C 2 C 2
;

proc sql;
  select brand
       , category
       , nobs
       , sum(nobs) as cat_total
       , nobs/calculated cat_total as percent
   from (select category,brand,count(*) as nobs 
         from have 
         group by 1,2
        )
   group by category
   order by 1,2
 ;

注意:查询需要使用原始数据重新汇总摘要统计信息。

答案 1 :(得分:0)

select count(category) from dataset group by category

此子查询返回多行。它返回每个类别的计数。但是你想要特定类别的计数,所以用

替换它
select count(category) from dataset where category = d.category

并确保您为dataset提供别名,即from dataset d

以下是使用派生表的另一种方式,其中一个派生表包含每个品牌/类别的计数,第二个表包含每个类别的总计数。

select cnt/total, t1.brand, t1.category 
from (
    select count(*) cnt, brand , category
    from dataset 
    group by brand, category
) t1 join (
   select count(*) total, category
   from dataset 
   group category
) t2 on t2.category = t1.category

答案 2 :(得分:0)

我会像Tom提到的那样使用proc freq。

proc freq data = yourdata;
table brand*category/missprint list;
run;

如果没有复杂的SQL编程,这应该可以为你提供所需的百分比。