Question

我有这样的数据集：

我只想保持出现3次的ID（即keep id=5345 and id=5844）并删除其余的ID。我如何在SAS中实现这一目标？我的数据按顺序按id排序。我想在输出数据集

中保留所有三个重复的ID

Answer 1

使用PROC SQL，您可以JOIN创建一个新的数据集，如下所示：

proc sql;
   create table want as
   select a.*
   from have a
   join (
      select id
      from   have
      group by id
      having count(*) = 3
      ) b
   on b.id=a.id
quit;

Answer 2

我不确定你是否只想要一个出现3次的ID列表，或者所有行的id都被复制3次。如果你想要前者，那么@ bellvueBob的代码将帮助你。

否则，这是获取3次出现在数据集中的ID列表的一种方法。此代码的优点是内存使用率和速度都很小，因为数据集已经排序。

data threeobs(keep=id);
  set myid;
  by id;
  if first.id then cnt=1;
  else cnt+1;
  if cnt=3 then output;
run;

Answer 3

PROC FREQ会直接告诉你。

proc freq data=myid;
tables id/out=threeobs(keep=count id where=(count=3));
run;

如果您的意思是3或更多，请使用＆gt; =而不是=。根据评论，这里是一个合并回原始数据的例子：

data have;
input id;
datalines;
3408
3408
3485
4592
4932
5345
5345
5345
5844
5844
5844
;;;;
run;

proc freq data=have;
tables id/out=ids(where=(count=3) keep=id count);
run;

proc sort data=have;
by id;
run;
data want;
merge have(in=h) ids(in=i);
by id;
if i;
run;

如何删除列中的一些观察结果？

3 个答案: