随机访问n个观察值

时间:2016-03-20 17:03:08

标签: sas

假设我有一组50个变量和50个观察值。

是否可以访问100个随机"单元格"并改变他们的价值观?

如果我没有必要使用SQL,那就太好了。

3 个答案:

答案 0 :(得分:1)

您的采样率为0.04 - 4%的记录将被设置为缺失。我假设您的变量类型相同,并且可以列在数组中。即便如此,也有办法解决这个问题。另一种选择是将数据翻转为宽结构,使用Proc SurveySelect选择100个随机值并设置为缺失。 以下代码仅使用BASE SAS技术。

/*Generate sample data*/
data sample;
array var(50) var1-var50;
do i=1 to 50;
    do j=1 to 50;
        var(j)=rand('normal', 25, 4);
        end;
    output;
end;

drop i j;
run;

*randomly assign to missing;
data sample_missing;
    call streaminit(123); *ensure reproducible 100 records;
    set sample;
    array var(50) var1-var50;
    rate=100/(50*50); *based on your question;

    retain num_miss 0;
    do i=1 to 50;

    if rand('bernoulli', rate) = 1 and num_miss < 100 then do;
        var(i)=.;
        num_miss+1;
    end;

    end;
run;

/*Check the values and code*/   
data check;
set sample_missing end=eof;
retain nmiss_cum;
nmiss_row = nmiss(of var1--var50);
nmiss_cum+nmiss_row;

/*if you only want to see the total number missing for checks uncomment the next two lines*/
**if eof then output;

 * *keep nmiss_cum;
run;

答案 1 :(得分:1)

如果您只想要100个缺失值,那么直接强力方法就是将您的数据视为2500个单元格。生成1到2500之间的100个随机数的列表。然后将这些单元格设置为缺失。如下所示:

data sample;
  array x(50);
  do i=1 to 50;
    do j=1 to 50;
      x(j)=rand('normal', 25, 4);
    end;
    output;
  end;

  drop i j;
run;

**Generate list of 100 random numbers (there are doubtless better ways : );
data cellno;
  do cellno=1 to 2500;
    ran=ranuni(3);
    output;
  end;
run;
proc sql outobs=100 noprint;
  select cellno into :celllist separated by " "
  from cellno
  order by ran
  ;
run;

%put &celllist;

*Use that list to recode 100 cells to null;
data want; 
  set sample;
  array x(50);
  do i=1 to 50;
    if (_n_-1)*50+i IN (&celllist) then
      call missing(x{i});
  end;
  drop i;
run;

答案 2 :(得分:1)

将K / N随机样本技术应用于此问题可能是最简单的。唯一的区别在于,只需选择观察结果就可以选择变量数组中的各个元素。

%let seed=12345;
%let varlist=X1-X50 ;
%let samplesize= 100 ;

data want;
  set have nobs=nobs ;
  array x &varlist ;
  retain _count &samplesize ;
  retain _left ;
  if _n_=1 then _left=dim(x)*nobs ;
  do i=1 to dim(x);
    if (_count/_left > ranuni(&seed)) then do;
        x(i) = . ;
        _count = _count - 1;
    end;
    _left = _left - 1;
  end;
  drop _left _count i ;
run;