假设我有一组50个变量和50个观察值。
是否可以访问100个随机"单元格"并改变他们的价值观?
如果我没有必要使用SQL,那就太好了。
答案 0 :(得分:1)
您的采样率为0.04 - 4%的记录将被设置为缺失。我假设您的变量类型相同,并且可以列在数组中。即便如此,也有办法解决这个问题。另一种选择是将数据翻转为宽结构,使用Proc SurveySelect选择100个随机值并设置为缺失。 以下代码仅使用BASE SAS技术。
/*Generate sample data*/
data sample;
array var(50) var1-var50;
do i=1 to 50;
do j=1 to 50;
var(j)=rand('normal', 25, 4);
end;
output;
end;
drop i j;
run;
*randomly assign to missing;
data sample_missing;
call streaminit(123); *ensure reproducible 100 records;
set sample;
array var(50) var1-var50;
rate=100/(50*50); *based on your question;
retain num_miss 0;
do i=1 to 50;
if rand('bernoulli', rate) = 1 and num_miss < 100 then do;
var(i)=.;
num_miss+1;
end;
end;
run;
/*Check the values and code*/
data check;
set sample_missing end=eof;
retain nmiss_cum;
nmiss_row = nmiss(of var1--var50);
nmiss_cum+nmiss_row;
/*if you only want to see the total number missing for checks uncomment the next two lines*/
**if eof then output;
* *keep nmiss_cum;
run;
答案 1 :(得分:1)
如果您只想要100个缺失值,那么直接强力方法就是将您的数据视为2500个单元格。生成1到2500之间的100个随机数的列表。然后将这些单元格设置为缺失。如下所示:
data sample;
array x(50);
do i=1 to 50;
do j=1 to 50;
x(j)=rand('normal', 25, 4);
end;
output;
end;
drop i j;
run;
**Generate list of 100 random numbers (there are doubtless better ways : );
data cellno;
do cellno=1 to 2500;
ran=ranuni(3);
output;
end;
run;
proc sql outobs=100 noprint;
select cellno into :celllist separated by " "
from cellno
order by ran
;
run;
%put &celllist;
*Use that list to recode 100 cells to null;
data want;
set sample;
array x(50);
do i=1 to 50;
if (_n_-1)*50+i IN (&celllist) then
call missing(x{i});
end;
drop i;
run;
答案 2 :(得分:1)
将K / N随机样本技术应用于此问题可能是最简单的。唯一的区别在于,只需选择观察结果就可以选择变量数组中的各个元素。
%let seed=12345;
%let varlist=X1-X50 ;
%let samplesize= 100 ;
data want;
set have nobs=nobs ;
array x &varlist ;
retain _count &samplesize ;
retain _left ;
if _n_=1 then _left=dim(x)*nobs ;
do i=1 to dim(x);
if (_count/_left > ranuni(&seed)) then do;
x(i) = . ;
_count = _count - 1;
end;
_left = _left - 1;
end;
drop _left _count i ;
run;