Question

谷歌搜索一直很困难。我有两个分类变量，年龄和月份，每个变量有7个级别。对于一些级别，比如age = 7和month = 7没有值，当我使用proc sql时，没有条目的交叉点不显示，例如：

    age   month value
     1       1    4
     2       1    12
     3       1    5
      ....
     7       1    6
     ...
     1       7    8
      ....
     5       7    44
     6       7    5 
     THIS LINE DOESNT SHOW

我想要什么

   age   month value
     1       1    4
     2       1    12
     3       1    5
      ....
     7       1    6
     ...
     1       7    8
      ....
     5       7    44
     6       7    5 
     7       7    0

这种情况在数据中发生了几次，其中最后一组没有价值，所以他们没有显示，但我希望它们用于以后的目的

Answer 1

您有几个选项可用，两者似乎都在创建主数据然后合并它的前提下工作。另一种方法是使用PRELOADFMT和FORMATs或CLASSDATA选项。

最后 - 但可能是最简单的，如果您在数据集和所有年龄段中都有所有月份，那么请使用PROC FREQ中的SPARSE选项。它创造了所有可能的组合。

proc freq data=have;
   table age*month /out = want SPARSE;
   weight value;
run;

Answer 2

首先是一些样本数据：

data test;
do age=1 to 7;
    do month=1 to 12;
        value = ceil(10*ranuni(1));
        if ranuni(1) < .9 then
            output;
    end;
end;
run;

这留下了一些漏洞，特别是（1,1）。

我会使用一系列SQL语句来获取级别，交叉连接这些级别，然后将值加入，在缺失时进行合并以放置0。

proc sql;
create table ages as 
select distinct age from test;

create table months as 
select distinct month from test;

create table want as
select a.age,
       a.month,
       coalesce(b.value,0) as value
    from (
            select age, month from ages, months
         ) as a
      left join
         test as b
      on a.age = b.age
       and a.month = b.month;
quit;

Answer 3

分类变量的组独立交叉需要对每个级别变量进行不同的选择crossed join与其他变量 - 这形成外壳，可以保持与原始的连接数据。对于年龄*月份有多个项目的情况，您需要确定是否需要

具有重复年龄和月份以及原始值的行
具有不同年龄和月份的行
- 汇总函数以汇总值，或
- 表示价值太多

data have;
input age   month value;
datalines;
     1       1    4
     2       1    12
     3       1    5
     7       1    6
     1       7    8
     5       7    44
     6       7    5 
     8       8    1
     8       8    11
run;

proc sql;
  create table want1(label="Original class combos including duplicates and zeros for absent cross joins")
  as
  select 
    allAges.age
  , allMonths.month
  , coalesce(have.value,0) as value
  from
    (select distinct age from have) as allAges
  cross join
    (select distinct month from have) as allMonths
  left join
    have
  on 
    have.age = allAges.age and have.month = allMonths.month
  order by 
    allMonths.month, allAges.age
  ;
quit;

略有变化，标志着重复的班级交叉

proc format;
  value S_V_V .t = 'Too many source values'; /* single valued value */
quit;

proc sql;
  create table want2(label="Distinct class combos allowing only one contributor to value, or defaulting to zero when none")
  as
  select distinct
    allAges.age
  , allMonths.month
  , case 
      when count(*) = 1 then coalesce(have.value,0)
      else .t
    end as value format=S_V_V.
  , count(*) as dup_check
  from
    (select distinct age from have) as allAges
  cross join
    (select distinct month from have) as allMonths
  left join
    have
  on 
    have.age = allAges.age and have.month = allMonths.month
  group by
    allMonths.month, allAges.age
  order by 
    allMonths.month, allAges.age
  ;
quit;

此类处理也可以使用CLASSDATA =选项在Proc TABULATE中完成。

显示分类变量中的所有值

3 个答案: