按列值聚合并在SAS中粘贴行值

时间:2015-04-20 17:24:07

标签: sas concatenation aggregate-functions

我的数据集如下:

有:

data have;

input a b c d e f g h ;

datalines;

1 0 0 0 0 0 1 0

0 0 1 0 1 0 0 0

0 0 0 1 0 1 0 0

0 1 0 0 0 0 0 1

;

run;

列a,b,c和d是4分制问题1的四个选项。 obs1栏A中的值“1”表示受访者为该问题选择了选项A,该选项在4分制上表示4。

a = 4,b = 3,c = 2且d = 1.

下一个问题的选项是e,f,g和h。受访者选择了选项g,即4分制中的2。 e = 4,f = 3,g = 2且h = 1。

数据集包含数百个这样的列。我的想法是将4列折叠为一个获取值,如:“1000”,“0100”,“0010”,“0001”,然后转换1000 = 4,0100 = 3,0010 = 2和0001 = 1。

我希望它像:

block   col1    col2    col3    col4
1       1000    0100    0010    0001
2       0100    0010    1000    0001
3       1000    0100    1000    0010

我已经走到这一步了:

proc transpose data = have out = have_t;
run;

data have_t_block;
set have_t;
retain block;
if _n_ = 1 then block = 1;
if mod(_n_/4,1) = 0.25 and _n_ gt 1 then block +1;
run;

有没有办法在SAS中按块聚合时连接行值?我在R中这样做,如下:

#Create data    
data <- data.frame(a = c(1, 0, 0), b = c(0, 1, 0), c = c(0, 0, 1), d = c(0, 0, 0), e = c(0, 1, 0), f = c(1, 0, 0), g = c(0, 0, 1), h = c(0, 0, 0), i = c(0, 0, 1), j = c(1, 0, 0), k = c(0, 0, 0), l = c(0, 1, 0))

#transpose
data <- data.frame(t(data))

#create a key for each group of 4
data$block <- rep(1:(nrow(data)/4), each = 4)

#convert data to long format and group by key (block) and use paste to concatenate
require(reshape2)
data_melt <- melt(data, id = c("block"))
trial <- data.frame(t(dcast(data_melt, block ~ variable, paste, collapse = "")))

2 个答案:

答案 0 :(得分:1)

首先,除非你错误地解释了你的数据,否则你的转置在这里并没有太大的帮助,因为没有特别的理由让每个受访者都有一个专栏 - 让我们只是拥有一栏,期间。这是一种更好的方法。

data have_t;
  set have;
  array cols a--h;
  do _i = 1 to dim(cols);
    value = cols[_i];
    output;
  end;
  keep value; *and an ID I hope?;
run;

制作数据集&#39; vertical&#39; (一栏)很容易。只需循环遍历所有列的数组,为每个集合设置一个公共变量到该值,即输出。通常我会跟踪我输出的变量名称,但也许这不是必需的。

对于您的主要问题,您要做的是使用retain,最有可能与您处理阻止的方式不同。在这里,我只是直接计算得分:

data want;
  set have_t;
  retain score;
  counter = mod(_n_,4);
  if counter=1 then block+1; *slightly easier version of what you wrote;
  if value=1 then score = 5-counter; *first=4, second=3, third=2, fourth=1;
  if counter=0 then output;
  *We never "clear" score here - to be safer you may want to do that in the if counter=1 block;
run;

如果你想要中间体&#0010;&#39;或者其他什么,你也可以包括它。

data want;
  set have_t;
  retain score int_Value;
  length int_Value $4;
  counter = mod(_n_,4);
  if counter=1 then block+1; *slightly easier version of what you wrote;
  if value=1 then score = 5-counter; *first=4, second=3, third=2, fourth=1;
  int_value = cats(int_value,value);
  if counter=0 then do;
    output;
    int_value=' ';  *have to clear this every 4;
    score=.;  *here we might as well clear it; 
  end;
run;

答案 1 :(得分:0)

如果我理解你的问题,试试这个:

data want;
  do i=1 by 1 until(last.block);
  set have_t_block;
  array var $4. var1-var4;
  array col col1-col4;
  length var1-var4 $4.;
  by block notsorted;
  do over var;
  var=cats(var,col);
  end;
  if last.block then output;
  end;
  keep var: block;
run;