显示分类变量中的所有值

时间:2017-12-12 11:56:38

标签: sas

谷歌搜索一直很困难。我有两个分类变量,年龄和月份,每个变量有7个级别。对于一些级别,比如age = 7和month = 7没有值,当我使用proc sql时,没有条目的交叉点不显示,例如:

    age   month value
     1       1    4
     2       1    12
     3       1    5
      ....
     7       1    6
     ...
     1       7    8
      ....
     5       7    44
     6       7    5 
     THIS LINE DOESNT SHOW

我想要什么

   age   month value
     1       1    4
     2       1    12
     3       1    5
      ....
     7       1    6
     ...
     1       7    8
      ....
     5       7    44
     6       7    5 
     7       7    0

这种情况在数据中发生了几次,其中最后一组没有价值,所以他们没有显示,但我希望它们用于以后的目的

3 个答案:

答案 0 :(得分:1)

您有几个选项可用,两者似乎都在创建主数据然后合并它的前提下工作。 另一种方法是使用PRELOADFMT和FORMATs或CLASSDATA选项。

最后 - 但可能是最简单的,如果您在数据集和所有年龄段中都有所有月份,那么请使用PROC FREQ中的SPARSE选项。它创造了所有可能的组合。

proc freq data=have;
   table age*month /out = want SPARSE;
   weight value;
run;

答案 1 :(得分:0)

首先是一些样本数据:

data test;
do age=1 to 7;
    do month=1 to 12;
        value = ceil(10*ranuni(1));
        if ranuni(1) < .9 then
            output;
    end;
end;
run;

这留下了一些漏洞,特别是(1,1)。

我会使用一系列SQL语句来获取级别,交叉连接这些级别,然后将值加入,在缺失时进行合并以放置0。

proc sql;
create table ages as 
select distinct age from test;

create table months as 
select distinct month from test;

create table want as
select a.age,
       a.month,
       coalesce(b.value,0) as value
    from (
            select age, month from ages, months
         ) as a
      left join
         test as b
      on a.age = b.age
       and a.month = b.month;
quit;

enter image description here

答案 2 :(得分:0)

分类变量的组独立交叉需要对每个级别变量进行不同的选择crossed join与其他变量 - 这形成外壳,可以保持与原始的连接数据。对于年龄*月份有多个项目的情况,您需要确定是否需要

  • 具有重复年龄和月份以及原始值的行
  • 具有不同年龄和月份的行
    • 汇总函数以汇总值,或
    • 表示价值太多

data have;
input age   month value;
datalines;
     1       1    4
     2       1    12
     3       1    5
     7       1    6
     1       7    8
     5       7    44
     6       7    5 
     8       8    1
     8       8    11
run;

proc sql;
  create table want1(label="Original class combos including duplicates and zeros for absent cross joins")
  as
  select 
    allAges.age
  , allMonths.month
  , coalesce(have.value,0) as value
  from
    (select distinct age from have) as allAges
  cross join
    (select distinct month from have) as allMonths
  left join
    have
  on 
    have.age = allAges.age and have.month = allMonths.month
  order by 
    allMonths.month, allAges.age
  ;
quit;

略有变化,标志着重复的班级交叉

proc format;
  value S_V_V .t = 'Too many source values'; /* single valued value */
quit;

proc sql;
  create table want2(label="Distinct class combos allowing only one contributor to value, or defaulting to zero when none")
  as
  select distinct
    allAges.age
  , allMonths.month
  , case 
      when count(*) = 1 then coalesce(have.value,0)
      else .t
    end as value format=S_V_V.
  , count(*) as dup_check
  from
    (select distinct age from have) as allAges
  cross join
    (select distinct month from have) as allMonths
  left join
    have
  on 
    have.age = allAges.age and have.month = allMonths.month
  group by
    allMonths.month, allAges.age
  order by 
    allMonths.month, allAges.age
  ;
quit;

此类处理也可以使用CLASSDATA =选项在Proc TABULATE中完成。