所以我试图根据特定的变量分组来分解大型数据集(70,000个具有1,790个变量的数据集)。 Excel或CSV是导出的理想格式,但对变量号(260或其他)有限制。有什么想法我怎么能在SAS(或其他R / SQL)中做到这一点?
我知道宏工作,我以前用过它。错误消息读取已达到变量的限制。
答案 0 :(得分:5)
创建Excel文件肯定有限制,但不是CSV文件。以下是使用虚拟SAS数据集的示例:
data a;
array x(*) x1-x1790;
do j=1 to 5;
do i=1 to dim(x);
x(i) = ranuni(0);
end;
output;
end;
run;
proc export data=a
outfile="c:\temp\tempfile.csv"
dbms=CSV
replace;
run;
以下是相关日志:
NOTE: The file 'c:\temp\tempfile.csv' is:
Filename=c:\temp\tempfile.csv,
RECFM=V,LRECL=32767,File Size (bytes)=0,
Last Modified=23Jan2013:15:27:13,
Create Time=23Jan2013:15:27:13
NOTE: 6 records were written to the file 'c:\temp\tempfile.csv'.
The minimum record length was 9636.
The maximum record length was 23087.
NOTE: There were 5 observations read from the data set WORK.A.
NOTE: DATA statement used (Total process time):
real time 0.26 seconds
cpu time 0.09 seconds
5 records created in c:\temp\tempfile.csv from A.
NOTE: "c:\temp\tempfile.csv" file was successfully created.
NOTE: PROCEDURE EXPORT used (Total process time):
real time 2.04 seconds
cpu time 0.26 seconds
请注意,第一行包含列标题。
更新:如果您有最新版本的SAS(9.3 TS1M1或更高版本),则可以创建一个Office 2010 Excel电子表格,其最多包含1,048,576行和16,384列。在这种情况下,您将使用DBMS=XLSX
。
答案 1 :(得分:1)
假设您只想将每个255列放入一个单独的文件中,并将两个文件分割到中点(35000条记录到文件A中,35001条末尾分成文件B,每组变量)。你会做这样的事情:
options mprint symbolgen;
data test;
array xs x1-x1700;
do id = 1 to 70000;
do _t = 1 to dim(xs);
xs[_t]=ranuni(7);
end;
output;
end;
run;
%macro export_file(varstart=,varend=,varnumstart=0,varnumend=0,recstart=1,recend=0,keeplist=,dset=, libname=WORK, outfile=,sheet="sheet1");
%if &varnumstart ne 0 %then %do;
proc sql noprint;
select name into :varstart from dictionary.columns
where libname=upcase("&libname.") and memname=upcase("&dset.") and varnum=&varnumstart.;
select name into :varend from dictionary.columns
where libname=upcase("&libname.") and memname=upcase("&dset.") and varnum=&varnumend.;
quit;
%end;
%if &varstart=%str() or &varend=%str() %then %do;
%put "ERROR: MISSING PARAMETERS. PLEASE CHECK YOUR MACRO CALL AND RERUN. MUST HAVE VARSTART AND VAREND OR VARNUMSTART AND VARNUMEND.";
%abort;
%end;
data _for_Export/view=_for_export;
set &libname..&dset;
keep &varstart.--&varend.
%if &keeplist ne %str() %then %do;
&keeplist
%end;
;
if _N_ ge &recstart.;
%if &recend ne 0 %then %do;
if _N_ le &recend.;
%end;
run;
proc export data=_for_export file=&outfile. dbms=excel replace;
sheet=&sheet.;
run;
proc datasets nolist noprint lib=work;
delete _for_export/memtype=view;
quit;
%mend export_file;
%export_file(varnumstart=1,varnumend=250, keeplist=id,recstart=1,recend=35000,dset=test,outfile="c:\temp\test.xls",sheet="sheet1");
%export_file(varnumstart=1,varnumend=250, keeplist=id,recstart=35001,recend=99999,dset=test,outfile="c:\temp\test.xls",sheet="sheet2");
%export_file(varnumstart=251,varnumend=500, keeplist=id,recstart=1,recend=35000,dset=test,outfile="c:\temp\test.xls",sheet="sheet3");
%export_file(varnumstart=251,varnumend=500, keeplist=id,recstart=35001,recend=99999,dset=test,outfile="c:\temp\test.xls",sheet="sheet4");
我尝试导出sheet4时失败,不确定.xls文件的总大小是否有一些限制,但您可以轻松修改它以创建单独的文件。如果您需要为每个单独的文件指定不连续的特定变量名称,这将无法工作,但您可以相当轻松地修改从dictionary.columns中提取的SQL代码,而不是从您创建的包含变量名称的表中提取想要在每个文件中。