SAS Proc排序与Proc SQL排序不同

时间:2018-10-14 03:36:44

标签: sorting sas

我使用proc sort对多个变量的数据集进行了排序,然后继续执行数据步骤,根据排序后的字段分配计数变量。多次运行此代码将得到相同的答案。 如果我使用proc sql和order by而不是proc sort,然后使用相同的datastep代码,则会得到不同的答案。 为什么会发生这种情况?

1 个答案:

答案 0 :(得分:1)

您的KEY变量具有重复项,其中第三个(或更多)变量不同。由于您没有指定第三个(或其他)变量,所以排序不完全相同。

这是一个例子。

/*Creates duplicate records with variables that are different for the remaining values*/

data temp_class;
set class (obs=3);
call streaminit(25);
weight = weight + rand('normal', 0, 5);
run;

data class;
set sashelp.class temp_class;
run;


*proc sort;
proc sort data=class out=class2;
by name sex;
run;

*proc sql sort;
proc sql;
create table class3 as
select *
from class
order by name, sex;
quit;

*comparison to show difference;
proc compare data=class2 compare=class3;
run;

结果表明,由于关键变量的定义不明确,这些结果的确产生了不同的结果。 为避免这种情况,还应按变量WEIGHT排序,以确保获得所需的顺序。

              The COMPARE Procedure                                                        
                                         Comparison of WORK.CLASS2 with    WORK.CLASS3                                             
                                                       (Method=EXACT)                                                           

                                                     Data Set Summary                                                           

                               Dataset               Created          Modified  NVar    NObs                                    

                               WORK.CLASS2  27SEP18:16:15:28  27SEP18:16:15:28     5      22                                    
                               WORK.CLASS3  27SEP18:16:14:47  27SEP18:16:14:47     5      22                                    


                                                     Variables Summary                                                          

                                           Number of Variables in Common: 5.                                                    


                                                    Observation Summary                                                         

                                               Observation      Base  Compare                                                   

                                               First Obs           1        1                                                   
                                               First Unequal       1        1                                                   
                                               Last  Unequal       4        4                                                   
                                               Last  Obs          22       22                                                   

                              Number of Observations in Common: 22.                                                             
                              Total Number of Observations Read from WORK.CLASS2: 22.                                           
                              Total Number of Observations Read from WORK.CLASS3: 22.                                           

                              Number of Observations with Some Compared Variables Unequal: 4.                                   
                              Number of Observations with All Compared Variables Equal: 18.                                     


                                                 Values Comparison Summary                                                      

                              Number of Variables Compared with All Observations Equal: 4.                                      
                              Number of Variables Compared with Some Observations Unequal: 1.                                   
                              Total Number of Values which Compare Unequal: 4.                                                  
                              Maximum Difference: 5.4405.                                                                       


                                               Variables with Unequal Values                                                    

                                            Variable  Type  Len  Ndif   MaxDif                                                  

                                            Weight    NUM     8     4    5.440                                                  

                                                   The COMPARE Procedure                                                        
                                         Comparison of WORK.CLASS2 with WORK.CLASS3                                             
                                                       (Method=EXACT)                                                           

                                           Value Comparison Results for Variables                                               

                                 __________________________________________________________                                     
                                            ||       Base    Compare                                                            
                                        Obs ||     Weight     Weight      Diff.     % Diff                                      
                                  ________  ||  _________  _________  _________  _________                                      
                                            ||                                                                                  
                                         1  ||   112.5000   117.3936     4.8936     4.3499                                      
                                         2  ||   117.3936   112.5000    -4.8936    -4.1686                                      
                                         3  ||    84.0000    78.5595    -5.4405    -6.4767                                      
                                         4  ||    78.5595    84.0000     5.4405     6.9253                                      
                                 __________________________________________________________                                 

这里是变量列表中包含WEIGHT的版本,以确保排序正确。

*proc sort;
proc sort data=class out=class4;
by name sex;
run;

*proc sql sort;
proc sql;
create table class5 as
select *
from class
order by name, sex;
quit;

*comparison to show difference;
proc compare data=class4 compare=class5;
run;

在这种情况下,结果没有区别:

  The COMPARE Procedure                                                        
                                         Comparison of WORK.CLASS4 with WORK.CLASS5                                             
                                                       (Method=EXACT)                                                           

                                                     Data Set Summary                                                           

                               Dataset               Created          Modified  NVar    NObs                                    

                               WORK.CLASS4  27SEP18:16:21:13  27SEP18:16:21:13     5      22                                    
                               WORK.CLASS5  27SEP18:16:21:13  27SEP18:16:21:13     5      22                                    


                                                     Variables Summary                                                          

                                           Number of Variables in Common: 5.                                                    


                                                    Observation Summary                                                         

                                               Observation      Base  Compare                                                   

                                               First Obs           1        1                                                   
                                               Last  Obs          22       22                                                   

                              Number of Observations in Common: 22.                                                             
                              Total Number of Observations Read from WORK.CLASS4: 22.                                           
                              Total Number of Observations Read from WORK.CLASS5: 22.                                           

                              Number of Observations with Some Compared Variables Unequal: 0.                                   
                              Number of Observations with All Compared Variables Equal: 22.                                     

                              NOTE: No unequal values were found. All values compared are exactly equal.