我的数据集如下:
有:
data have;
input a b c d e f g h ;
datalines;
1 0 0 0 0 0 1 0
0 0 1 0 1 0 0 0
0 0 0 1 0 1 0 0
0 1 0 0 0 0 0 1
;
run;
列a,b,c和d是4分制问题1的四个选项。 obs1栏A中的值“1”表示受访者为该问题选择了选项A,该选项在4分制上表示4。
a = 4,b = 3,c = 2且d = 1.
下一个问题的选项是e,f,g和h。受访者选择了选项g,即4分制中的2。 e = 4,f = 3,g = 2且h = 1。
数据集包含数百个这样的列。我的想法是将4列折叠为一个获取值,如:“1000”,“0100”,“0010”,“0001”,然后转换1000 = 4,0100 = 3,0010 = 2和0001 = 1。
我希望它像:
block col1 col2 col3 col4
1 1000 0100 0010 0001
2 0100 0010 1000 0001
3 1000 0100 1000 0010
我已经走到这一步了:
proc transpose data = have out = have_t;
run;
data have_t_block;
set have_t;
retain block;
if _n_ = 1 then block = 1;
if mod(_n_/4,1) = 0.25 and _n_ gt 1 then block +1;
run;
有没有办法在SAS中按块聚合时连接行值?我在R中这样做,如下:
#Create data
data <- data.frame(a = c(1, 0, 0), b = c(0, 1, 0), c = c(0, 0, 1), d = c(0, 0, 0), e = c(0, 1, 0), f = c(1, 0, 0), g = c(0, 0, 1), h = c(0, 0, 0), i = c(0, 0, 1), j = c(1, 0, 0), k = c(0, 0, 0), l = c(0, 1, 0))
#transpose
data <- data.frame(t(data))
#create a key for each group of 4
data$block <- rep(1:(nrow(data)/4), each = 4)
#convert data to long format and group by key (block) and use paste to concatenate
require(reshape2)
data_melt <- melt(data, id = c("block"))
trial <- data.frame(t(dcast(data_melt, block ~ variable, paste, collapse = "")))
答案 0 :(得分:1)
首先,除非你错误地解释了你的数据,否则你的转置在这里并没有太大的帮助,因为没有特别的理由让每个受访者都有一个专栏 - 让我们只是拥有一栏,期间。这是一种更好的方法。
data have_t;
set have;
array cols a--h;
do _i = 1 to dim(cols);
value = cols[_i];
output;
end;
keep value; *and an ID I hope?;
run;
制作数据集&#39; vertical&#39; (一栏)很容易。只需循环遍历所有列的数组,为每个集合设置一个公共变量到该值,即输出。通常我会跟踪我输出的变量名称,但也许这不是必需的。
对于您的主要问题,您要做的是使用retain
,最有可能与您处理阻止的方式不同。在这里,我只是直接计算得分:
data want;
set have_t;
retain score;
counter = mod(_n_,4);
if counter=1 then block+1; *slightly easier version of what you wrote;
if value=1 then score = 5-counter; *first=4, second=3, third=2, fourth=1;
if counter=0 then output;
*We never "clear" score here - to be safer you may want to do that in the if counter=1 block;
run;
如果你想要中间体&#0010;&#39;或者其他什么,你也可以包括它。
data want;
set have_t;
retain score int_Value;
length int_Value $4;
counter = mod(_n_,4);
if counter=1 then block+1; *slightly easier version of what you wrote;
if value=1 then score = 5-counter; *first=4, second=3, third=2, fourth=1;
int_value = cats(int_value,value);
if counter=0 then do;
output;
int_value=' '; *have to clear this every 4;
score=.; *here we might as well clear it;
end;
run;
答案 1 :(得分:0)
如果我理解你的问题,试试这个:
data want;
do i=1 by 1 until(last.block);
set have_t_block;
array var $4. var1-var4;
array col col1-col4;
length var1-var4 $4.;
by block notsorted;
do over var;
var=cats(var,col);
end;
if last.block then output;
end;
keep var: block;
run;