我想从数据集中删除所有空白观察。 我只知道如何摆脱一个变量的空白:
data a;
set data(where=(var1 ne .)) ;
run;
这里我设置了一个没有var1空白的新数据集。 但是,当我想摆脱整个数据集中的所有空白时,如何做到这一点?
提前感谢您的回答。
答案 0 :(得分:15)
如果您试图摆脱缺少所有变量的行,这很容易:
/* Create an example with some or all columns missing */
data have;
set sashelp.class;
if _N_ in (2,5,8,13) then do;
call missing(of _numeric_);
end;
if _N_ in (5,6,8,12) then do;
call missing(of _character_);
end;
run;
/* This is the answer */
data want;
set have;
if compress(cats(of _all_),'.')=' ' then delete;
run;
您也可以事先使用OPTIONS MISSING=' ';
代替压缩。
如果要删除包含任何缺失值的所有行,则可以使用NMISS / CMISS函数。
data want;
set have;
if nmiss(of _numeric_) > 0 then delete;
run;
或
data want;
set have;
if nmiss(of _numeric_) + cmiss(of _character_) > 0 then delete;
run;
表示所有字符+数字变量。
答案 1 :(得分:6)
您可以这样做:
data myData;
set myData;
array a(*) _numeric_;
do i=1 to dim(a);
if a(i) = . then delete;
end;
drop i;
这将扫描所有数字变量,并删除发现缺失值的观察值
答案 2 :(得分:1)
你走了。无论变量是字符还是数字,这都可以。
data withBlanks;
input a$ x y z;
datalines;
a 1 2 3
b 1 . 3
c . . 3
. . .
d . 2 3
e 1 . 3
f 1 2 3
;
run;
%macro removeRowsWithMissingVals(inDsn, outDsn, Exclusion);
/*Inputs:
inDsn: Input dataset with some or all columns missing for some or all rows
outDsn: Output dataset with some or all columns NOT missing for some or all rows
Exclusion: Should be one of {AND, OR}. AND will only exclude rows if any columns have missing values, OR will exclude only rows where all columns have missing values
*/
/*get a list of variables in the input dataset along with their types (i.e., whether they are numericor character type)*/
PROC CONTENTS DATA = &inDsn OUT = CONTENTS(keep = name type varnum);
RUN;
/*put each variable with its own comparison string in a seperate macro variable*/
data _null_;
set CONTENTS nobs = num_of_vars end = lastObs;
/*use NE. for numeric cols (type=1) and NE '' for char types*/
if type = 1 then call symputx(compress("var"!!varnum), compbl(name!!" NE . "));
else call symputx(compress("var"!!varnum), compbl(name!!" NE '' "));
/*make a note of no. of variables to check in the dataset*/
if lastObs then call symputx("no_of_obs", _n_);
run;
DATA &outDsn;
set &inDsn;
where
%do i =1 %to &no_of_obs.;
&&var&i.
%if &i < &no_of_obs. %then &Exclusion;
%end;
;
run;
%mend removeRowsWithMissingVals;
%removeRowsWithMissingVals(withBlanks, withOutBlanksAND, AND);
%removeRowsWithMissingVals(withBlanks, withOutBlanksOR, OR);
outout ofOutBlanksAND:
a x y z
a 1 2 3
f 1 2 3
withOutBlanksOR的输出:
a x y z
a 1 2 3
b 1 . 3
c . . 3
e 1 . 3
f 1 2 3