我的源数据包含200,000多个观察结果,数据集中的众多变量之一是#34; county。"我的目标是编写一个宏,将这一个数据集作为输入,并将它们分成每个加州郡的58个不同的临时数据集。
第一个问题是,是否可以使用像之前定义的全局参考数组之类的东西来指定数据语句中的58个县。
第二个问题是,假设在数据语句中正确指定了输出数据集,是否可以使用do循环来选择要写入的正确数据集?
我可以让比较正常工作,但似乎无法使用数组引用来指定输出数据集。这很可能是因为我需要更多的宏观环境经验!
请参阅下面我到目前为止编写的简单骨架框架。 c_long数组包含每个县的名称,c_short数组包含每个县的3个字母缩写。提前谢谢!
data splitraw;
length county_name $15;
infile "&path/random.csv" dsd firstobs=2;
input county_name $ number;
run;
%macro _58countysplit(dxtosplit,countycol);
data <need to specify 58 data sets here named something like &dxtosplit_ALA, &dxtosplit_ALP, etc..>;
set &dxtosplit;
do i=1 to 58;
if c_long{i}=&countycol then output &dxtosplit._&c_short{i};
end;
run;
%mend _58countysplit;
%_58countysplit(splitraw,county_name);
答案 0 :(得分:1)
您提供的代码需要在大型数据集中运行58次,每次编写一个小型数据集。我做的有点不同了。 首先,我创建一个带有变量“county”的样本数据集,它将包含十个不同的值:
data large;
attrib county length=$12;
do i=1 to 10000;
county=put(mod(i,10)+1,ROMAN.);
output;
end;
run;
首先,我首先找到所有唯一值并构建我想要创建的所有不同表的名称:
proc sql noprint;
select distinct compbl("large_"!!county) into :counties separated by " "
from large;
quit;
现在我有一个宏变量“县”,它包含我想要创建的所有不同数据集。
这里我将IF语句写入文件:
filename x temp;
data _null_;
attrib county length=$12 ds length=$18;
file x;
i=1;
do while(scan("&counties",i," ") ne "");
ds=scan("&counties",i," ");
county=scan(ds,-1,"_");
put "if county=""" county +(-1) """ then output " ds ";";
i+1;
end;
run;
现在我有了创建小数据集所需的内容:
data &counties;
set large;
%inc x;
run;
答案 1 :(得分:0)
我同意user667489,然后几乎总有一种更好的方法,然后将一个大数据集拆分成许多小数据集。但是,如果你想沿着这些方向前进,那么在sashelp中有一个名为vcolumn的表,它列出了你应该帮助你的所有库,它们的表和每一列(在每个表中)。如果你想要
if c_long{i}=&countycol then output &dxtosplit._&c_short{i};
解决你可能意味着:
if c_long{i}=&countycol then output &&dxtosplit._&c_short{i};
答案 2 :(得分:0)
根据你真正想做的事情,很可能就是你需要的BY处理。不过,这是一个简单的解决方案:
%macro split_by(data=, splitvar=);
%local dslist iflist;
proc sql noprint;
select distinct cats("&splitvar._", &splitvar)
into :dslist separated by ' '
from &data;
select distinct
catt("if &splitvar='", &splitvar, "' then output &splitvar._", &splitvar, ";", '0A'x)
into :iflist separated by "else "
from &data;
quit;
data &dslist;
set &data;
&iflist
run;
%mend split_by;
以下是一些测试数据:
options mprint;
data test;
length county $1 val $1;
input county val;
infile cards;
datalines;
A 2
B 3
A 5
C 8
C 9
D 10
run;
%split_by(data=test, splitvar=county)
您可以查看日志以查看宏如何生成您想要的DATA步骤:
MPRINT(SPLIT_BY): proc sql noprint;
MPRINT(SPLIT_BY): select distinct cats("county_", county) into :dslist separated by ' ' from test;
MPRINT(SPLIT_BY): select distinct catt("if county='", county, "' then output county_", county, ";", '0A'x) into :iflist separated
by "else " from test;
MPRINT(SPLIT_BY): quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
MPRINT(SPLIT_BY): data county_A county_B county_C county_D;
MPRINT(SPLIT_BY): set test;
MPRINT(SPLIT_BY): if county='A' then output county_A;
MPRINT(SPLIT_BY): else if county='B' then output county_B;
MPRINT(SPLIT_BY): else if county='C' then output county_C;
MPRINT(SPLIT_BY): else if county='D' then output county_D;
MPRINT(SPLIT_BY): run;
NOTE: There were 6 observations read from the data set WORK.TEST.
NOTE: The data set WORK.COUNTY_A has 2 observations and 2 variables.
NOTE: The data set WORK.COUNTY_B has 1 observations and 2 variables.
NOTE: The data set WORK.COUNTY_C has 2 observations and 2 variables.
NOTE: The data set WORK.COUNTY_D has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.05 seconds