Question

虽然使用PROC COMPARE是SAS，是否可以列出找到的所有重复项？默认情况下，将显示一条消息，说明找到的第一个副本和重复的总数。

即：

data x1;
  input x $ y $ z $ ;
  datalines;
   222 test abc
   qqq test abc
   aaa test abc
   222 test abc
   222 test abc
   ;
run;


data y1;
  input x $ y $ z $ ;
  datalines;
   222 test abc
   qqq test abc
   aaa test abc
   222 test abc
   222 test abc
   ;
run;

***********************************;
*** sort data;
***********************************;

proc sort data=x1; 
   by x y; 
run;

proc sort data=y1; 
   by x y; 
run;

***********************************;
*** compare data;
***********************************;

proc compare listvar
   base=x1
   compare = y1;
   id x y;
run;

************** END *****************;

输出

SAS系统

                                  The COMPARE Procedure                                       
                            Comparison of WORK.X1 with WORK.Y1                                
                                      (Method=EXACT)                                          

                                    Data Set Summary                                          

                Dataset           Created          Modified  NVar    NObs                     

                WORK.X1  23OCT14:16:03:38  23OCT14:16:03:38     3       5                     
                WORK.Y1  23OCT14:16:03:38  23OCT14:16:03:38     3       5                     


                                    Variables Summary                                         

                          Number of Variables in Common: 3.                                   
                          Number of ID Variables: 2.                                          





             WARNING: The data set WORK.X1 contains a duplicate observation at observation    
                      number 2.                                                               
             NOTE: At observation 2 the current and previous ID values are:                   
                   x=222 y=test.                                                              
             NOTE: Further warnings for duplicate observations in this data set will not be   
                   printed.                                                                   
             WARNING: The data set WORK.Y1 contains a duplicate observation at observation    
                      number 2.                                                               
             NOTE: At observation 2 the current and previous ID values are:                   
                   x=222 y=test.                                                              
             NOTE: Further warnings for duplicate observations in this data set will not be   
                   printed.                                                                   







                                   Observation Summary                                        

                   Observation      Base  Compare  ID                                         

                   First Obs           1        1  x=222 y=test                               
                   Last  Obs           5        5  x=qqq y=test                               

             Number of Observations in Common: 5.                                             
             Number of Duplicate Observations found in WORK.X1: 2.                            
             Number of Duplicate Observations found in WORK.Y1: 2.                          
             Total Number of Observations Read from WORK.X1: 5.                               
             Total Number of Observations Read from WORK.Y1: 5.                               

             Number of Observations with Some Compared Variables Unequal: 0.                  
             Number of Observations with All Compared Variables Equal: 5.                     

             NOTE: No unequal values were found. All values compared are exactly equal.

@ Joe - 感谢您的评论！

Answer 1

Proc Freq可能是查找重复项的好方法。然后用Proc Print打印出来。

PROC FREQ; 
 TABLES keyvar / noprint out=keylist;
RUN; 
PROC PRINT data=keylist; 
 WHERE count ge 2; 
RUN;

Answer 2

我认为有一种方法可以让日志或列表不仅仅列出第一个副本，如果你正在使用ID语句，那么就是这样。

您最好做的是使用OUTALL选项，并将结果输出到数据集（如果您还没有）。然后，很容易看到重复项。

例如：

data class2 class3;
  set sashelp.class;
    output;
    output;
    output class3;
run;

proc compare base=class2 compare=class3 out=outclass outall;
  id name;
run;

如果它已经排序，你也可以使用BY语句和ID语句;然后你仍然会有重复，但是每个BY组都有一个单独的报告，所以你会在那里看到重复的。

proc compare base=class2 compare=class3 out=outclass outall;
  by name;
  id name;
run;

Answer 3

查找每个id的确切重复数可能更适合proc sql。

类似的东西：

proc sql;
create table x2 as select
*,
count(id_var)
from x1
group by x,y,z;
quit;

这可能会显示任一数据集中的任何重复行。

SAS - Proc Compare - 显示所有重复项

3 个答案: