Question

我有一个SAS数据集说，df是这样的：

Input:
A B C D
1 . . .
2 . . .
3 0 1 1
4 1 0 1

数据创建代码：

data df;
  input A B C D;
  DATALINES;
1 . . .
2 . . .
3 0 1 1
4 1 0 1
;
run;

现在我要删除前2行。我需要的逻辑是删除df中除A之外的行中缺少所有值的那些行。

Output:
A B C D
3 0 1 1
4 1 0 1

我是SAS的新手，我在没有proc sql的情况下请求答案。

注意：这里我只给了4列。实际上我有超过25列。我需要一个通用的答案，而不使用列名B，C，D

Answer 1

您可以对数字和字符变量使用CMISS()函数。但是你需要知道有多少变量。

data have;
  input A B C $ D;
cards;
1 . . .
2 . . .
3 0 1 1
4 1 0 1
;

data want;
  set have;
  if cmiss(of B--D)<3 ;
run;

Answer 2

将所有相关变量放在数组中并计算NON缺失值的数量。输出具有一个或多个非缺失值的所有行。

Answer 3

如果所有变量都是数字变量，则n函数将起作用，因为这会计算非缺失值的数量。

data have;
Input A B C D;
datalines;
1 . . .
2 . . .
3 0 1 1
4 1 0 1
;
run;

data have;
modify have;
if n(of B--D)=0 then remove;
run;

Answer 4

data result;
   set df;
   where ^missing(B) and ^missing(C) and ^missing(D);
run;

或

proc sql noprint;
   create table result as
   select *
   from df
   where ^missing(B) and ^missing(C) and ^missing(D);
quit;

编辑：

proc contents data=df out=df_CONTENTS; run;
proc sql noprint;
   select cats('^missing(',NAME,')') into :var_names separated by ' and '
   from df_CONTENTS
   where NAME ^= 'A';
quit;

然后你可以使用值＆＃39; var_names＆＃39; for where子句中的条件为：

where &var_names;

删除SAS数据集中的行，其中除了一列之外缺少所有其他值

4 个答案: