虽然使用PROC COMPARE是SAS,是否可以列出找到的所有重复项?默认情况下,将显示一条消息,说明找到的第一个副本和重复的总数。
即:
data x1;
input x $ y $ z $ ;
datalines;
222 test abc
qqq test abc
aaa test abc
222 test abc
222 test abc
;
run;
data y1;
input x $ y $ z $ ;
datalines;
222 test abc
qqq test abc
aaa test abc
222 test abc
222 test abc
;
run;
***********************************;
*** sort data;
***********************************;
proc sort data=x1;
by x y;
run;
proc sort data=y1;
by x y;
run;
***********************************;
*** compare data;
***********************************;
proc compare listvar
base=x1
compare = y1;
id x y;
run;
************** END *****************;
输出
SAS系统
The COMPARE Procedure
Comparison of WORK.X1 with WORK.Y1
(Method=EXACT)
Data Set Summary
Dataset Created Modified NVar NObs
WORK.X1 23OCT14:16:03:38 23OCT14:16:03:38 3 5
WORK.Y1 23OCT14:16:03:38 23OCT14:16:03:38 3 5
Variables Summary
Number of Variables in Common: 3.
Number of ID Variables: 2.
WARNING: The data set WORK.X1 contains a duplicate observation at observation
number 2.
NOTE: At observation 2 the current and previous ID values are:
x=222 y=test.
NOTE: Further warnings for duplicate observations in this data set will not be
printed.
WARNING: The data set WORK.Y1 contains a duplicate observation at observation
number 2.
NOTE: At observation 2 the current and previous ID values are:
x=222 y=test.
NOTE: Further warnings for duplicate observations in this data set will not be
printed.
Observation Summary
Observation Base Compare ID
First Obs 1 1 x=222 y=test
Last Obs 5 5 x=qqq y=test
Number of Observations in Common: 5.
Number of Duplicate Observations found in WORK.X1: 2.
Number of Duplicate Observations found in WORK.Y1: 2.
Total Number of Observations Read from WORK.X1: 5.
Total Number of Observations Read from WORK.Y1: 5.
Number of Observations with Some Compared Variables Unequal: 0.
Number of Observations with All Compared Variables Equal: 5.
NOTE: No unequal values were found. All values compared are exactly equal.
@ Joe - 感谢您的评论!
答案 0 :(得分:1)
Proc Freq可能是查找重复项的好方法。然后用Proc Print打印出来。
PROC FREQ;
TABLES keyvar / noprint out=keylist;
RUN;
PROC PRINT data=keylist;
WHERE count ge 2;
RUN;
答案 1 :(得分:0)
我认为有一种方法可以让日志或列表不仅仅列出第一个副本,如果你正在使用ID语句,那么就是这样。
您最好做的是使用OUTALL
选项,并将结果输出到数据集(如果您还没有)。然后,很容易看到重复项。
例如:
data class2 class3;
set sashelp.class;
output;
output;
output class3;
run;
proc compare base=class2 compare=class3 out=outclass outall;
id name;
run;
如果它已经排序,你也可以使用BY语句和ID语句;然后你仍然会有重复,但是每个BY组都有一个单独的报告,所以你会在那里看到重复的。
proc compare base=class2 compare=class3 out=outclass outall;
by name;
id name;
run;
答案 2 :(得分:0)
查找每个id的确切重复数可能更适合proc sql。
类似的东西:
proc sql;
create table x2 as select
*,
count(id_var)
from x1
group by x,y,z;
quit;
这可能会显示任一数据集中的任何重复行。