谷歌搜索一直很困难。我有两个分类变量,年龄和月份,每个变量有7个级别。对于一些级别,比如age = 7和month = 7没有值,当我使用proc sql时,没有条目的交叉点不显示,例如:
age month value
1 1 4
2 1 12
3 1 5
....
7 1 6
...
1 7 8
....
5 7 44
6 7 5
THIS LINE DOESNT SHOW
我想要什么
age month value
1 1 4
2 1 12
3 1 5
....
7 1 6
...
1 7 8
....
5 7 44
6 7 5
7 7 0
这种情况在数据中发生了几次,其中最后一组没有价值,所以他们没有显示,但我希望它们用于以后的目的
答案 0 :(得分:1)
您有几个选项可用,两者似乎都在创建主数据然后合并它的前提下工作。 另一种方法是使用PRELOADFMT和FORMATs或CLASSDATA选项。
最后 - 但可能是最简单的,如果您在数据集和所有年龄段中都有所有月份,那么请使用PROC FREQ中的SPARSE选项。它创造了所有可能的组合。
proc freq data=have;
table age*month /out = want SPARSE;
weight value;
run;
答案 1 :(得分:0)
首先是一些样本数据:
data test;
do age=1 to 7;
do month=1 to 12;
value = ceil(10*ranuni(1));
if ranuni(1) < .9 then
output;
end;
end;
run;
这留下了一些漏洞,特别是(1,1)。
我会使用一系列SQL语句来获取级别,交叉连接这些级别,然后将值加入,在缺失时进行合并以放置0。
proc sql;
create table ages as
select distinct age from test;
create table months as
select distinct month from test;
create table want as
select a.age,
a.month,
coalesce(b.value,0) as value
from (
select age, month from ages, months
) as a
left join
test as b
on a.age = b.age
and a.month = b.month;
quit;
答案 2 :(得分:0)
分类变量的组独立交叉需要对每个级别变量进行不同的选择crossed join
与其他变量 - 这形成外壳,可以保持与原始的连接数据。对于年龄*月份有多个项目的情况,您需要确定是否需要
data have;
input age month value;
datalines;
1 1 4
2 1 12
3 1 5
7 1 6
1 7 8
5 7 44
6 7 5
8 8 1
8 8 11
run;
proc sql;
create table want1(label="Original class combos including duplicates and zeros for absent cross joins")
as
select
allAges.age
, allMonths.month
, coalesce(have.value,0) as value
from
(select distinct age from have) as allAges
cross join
(select distinct month from have) as allMonths
left join
have
on
have.age = allAges.age and have.month = allMonths.month
order by
allMonths.month, allAges.age
;
quit;
略有变化,标志着重复的班级交叉
proc format;
value S_V_V .t = 'Too many source values'; /* single valued value */
quit;
proc sql;
create table want2(label="Distinct class combos allowing only one contributor to value, or defaulting to zero when none")
as
select distinct
allAges.age
, allMonths.month
, case
when count(*) = 1 then coalesce(have.value,0)
else .t
end as value format=S_V_V.
, count(*) as dup_check
from
(select distinct age from have) as allAges
cross join
(select distinct month from have) as allMonths
left join
have
on
have.age = allAges.age and have.month = allMonths.month
group by
allMonths.month, allAges.age
order by
allMonths.month, allAges.age
;
quit;
此类处理也可以使用CLASSDATA =选项在Proc TABULATE中完成。