我使用proc sort对多个变量的数据集进行了排序,然后继续执行数据步骤,根据排序后的字段分配计数变量。多次运行此代码将得到相同的答案。 如果我使用proc sql和order by而不是proc sort,然后使用相同的datastep代码,则会得到不同的答案。 为什么会发生这种情况?
答案 0 :(得分:1)
您的KEY变量具有重复项,其中第三个(或更多)变量不同。由于您没有指定第三个(或其他)变量,所以排序不完全相同。
这是一个例子。
/*Creates duplicate records with variables that are different for the remaining values*/
data temp_class;
set class (obs=3);
call streaminit(25);
weight = weight + rand('normal', 0, 5);
run;
data class;
set sashelp.class temp_class;
run;
*proc sort;
proc sort data=class out=class2;
by name sex;
run;
*proc sql sort;
proc sql;
create table class3 as
select *
from class
order by name, sex;
quit;
*comparison to show difference;
proc compare data=class2 compare=class3;
run;
结果表明,由于关键变量的定义不明确,这些结果的确产生了不同的结果。 为避免这种情况,还应按变量WEIGHT排序,以确保获得所需的顺序。
The COMPARE Procedure
Comparison of WORK.CLASS2 with WORK.CLASS3
(Method=EXACT)
Data Set Summary
Dataset Created Modified NVar NObs
WORK.CLASS2 27SEP18:16:15:28 27SEP18:16:15:28 5 22
WORK.CLASS3 27SEP18:16:14:47 27SEP18:16:14:47 5 22
Variables Summary
Number of Variables in Common: 5.
Observation Summary
Observation Base Compare
First Obs 1 1
First Unequal 1 1
Last Unequal 4 4
Last Obs 22 22
Number of Observations in Common: 22.
Total Number of Observations Read from WORK.CLASS2: 22.
Total Number of Observations Read from WORK.CLASS3: 22.
Number of Observations with Some Compared Variables Unequal: 4.
Number of Observations with All Compared Variables Equal: 18.
Values Comparison Summary
Number of Variables Compared with All Observations Equal: 4.
Number of Variables Compared with Some Observations Unequal: 1.
Total Number of Values which Compare Unequal: 4.
Maximum Difference: 5.4405.
Variables with Unequal Values
Variable Type Len Ndif MaxDif
Weight NUM 8 4 5.440
The COMPARE Procedure
Comparison of WORK.CLASS2 with WORK.CLASS3
(Method=EXACT)
Value Comparison Results for Variables
__________________________________________________________
|| Base Compare
Obs || Weight Weight Diff. % Diff
________ || _________ _________ _________ _________
||
1 || 112.5000 117.3936 4.8936 4.3499
2 || 117.3936 112.5000 -4.8936 -4.1686
3 || 84.0000 78.5595 -5.4405 -6.4767
4 || 78.5595 84.0000 5.4405 6.9253
__________________________________________________________
这里是变量列表中包含WEIGHT的版本,以确保排序正确。
*proc sort;
proc sort data=class out=class4;
by name sex;
run;
*proc sql sort;
proc sql;
create table class5 as
select *
from class
order by name, sex;
quit;
*comparison to show difference;
proc compare data=class4 compare=class5;
run;
在这种情况下,结果没有区别:
The COMPARE Procedure
Comparison of WORK.CLASS4 with WORK.CLASS5
(Method=EXACT)
Data Set Summary
Dataset Created Modified NVar NObs
WORK.CLASS4 27SEP18:16:21:13 27SEP18:16:21:13 5 22
WORK.CLASS5 27SEP18:16:21:13 27SEP18:16:21:13 5 22
Variables Summary
Number of Variables in Common: 5.
Observation Summary
Observation Base Compare
First Obs 1 1
Last Obs 22 22
Number of Observations in Common: 22.
Total Number of Observations Read from WORK.CLASS4: 22.
Total Number of Observations Read from WORK.CLASS5: 22.
Number of Observations with Some Compared Variables Unequal: 0.
Number of Observations with All Compared Variables Equal: 22.
NOTE: No unequal values were found. All values compared are exactly equal.