在SAS中,我有一个包含以下值的变量V
V=1996199619961996200120012001
我想创建这两个变量
V1=19962001 (= different modalities)
V2=42 (= the first modality appears 4 times and the second one appears 2 times)
有什么想法吗?
感谢您的帮助。
吕克
答案 0 :(得分:1)
对于你的第一个问题(如果我理解正确的模式),你可以提取前四个字符和后四个字符:
a = substr(变量,1,4)
b = substrn(变量,max(1,长度(变量)-3),4);
然后你可以连接这两个。
c = cats(a,b)
对于第二个,COUNT函数可用于计算字符串中字符串的出现次数:
希望这会有所帮助:)
答案 1 :(得分:1)
让它更通用;
%let modeLength = 4;
%let maxOccur = 100; ** in the input **;
%let maxModes = 10; ** in the output **;
特定事件从何处开始?;
%macro occurStart(occurNo);
&modeLength.*&occurNo.-%eval(&modeLength.-1)
%mend;
阅读输入;
data simplified ;
infile datalines truncover;
input v $%eval(&modeLength.*&maxOccur.).;
声明输出和工作变量;
format what $&modeLength..
v1 $%eval(&modeLength.*&maxModes.).
v2 $&maxModes..;
array w {&maxModes.}; ** what **;
array c {&maxModes.}; ** count **;
发现独特模式并计算;
countW = 0;
do vNo = 1 to length(v)/&modeLength.;
what = substr(v, %occurStart(vNo), &modeLength.);
do wNo = 1 to countW;
if what eq w(wNo) then do;
c(wNo) = c(wNo) + 1;
goto foundIt;
end;
end;
countW = countW + 1;
w(countW) = what;
c(countW) = 1;
foundIt:
end;
在v1和v2中报告结果;
do wNo = 1 to countW;
substr(v1, %occurStart(wNo), &modeLength.) = w(wNo);
substr(v2, wNo, 1) = put(c(wNo),1.);
put _N_= v1= v2=;
end;
keep v1 v2;
我测试的数据;
datalines;
1996199619961996200120012001
197019801990
20011996199619961996200120012001
;
run;