我正在以下列格式导入CSV数据:
SEDOL,12/08/2009,13/08/2009,14/08/2009,17/08/2009,18/08/2009 B1YVN39,7.8431,7.8431,7.8431,7.8431,7.598 B00G7R3,3.8,3.61,3.81,3.81,3.81 2965237,4.5351,4.5351,4.5351,4.5351,4.5351 2554345,7.355,7.355,7.355,7.355,7.355
我正在使用以下命令:
PROC IMPORT OUT= want
DATAFILE= have
DBMS=CSV REPLACE;
RUN;
然后将数据转换为长格式,如下所示:
PROC SORT DATA=want OUT=want; BY SEDOL;RUN;
proc transpose data=want out=transp;
by SEDOL;
run;
proc print; run;
如何导入格式正确的日期并将变量类型从默认更改为日期?
答案 0 :(得分:1)
导入和转置是一个方便的过程,但如果您能很好地理解您的数据,那么一个小数据步骤程序可以一步到位:
data want(keep=sedol v_date v_value);
infile have dsd dlm=',' truncover;
informat sedol $8. d1-d50 ddmmyy10. v1-v50 8.;
format v_date yymmdd10.;
array d(50) d1-d50;
array v(50) v1-v50;
/* Retain the date values and the count of dates */
retain d1-d50 idx;
/* Read header */
if _n_ = 1 then do;
input sedol d1-d50;
/* loop to find how many date columns there are */
do idx=1 to 50 while(d(idx) ne .);
end;
idx = idx - 1; /* must subtract one here */
delete;
end;
/* Read data lines */
input sedol v1-v50;
do i=1 to idx;
v_date = d(i);
v_value = v(i);
output;
end;
run;
只要您的输入文件与您描述的完全一致(带有前导ID变量少于8个字符的标题记录,后跟一些表示列的日期值),这将最多处理50个测量值。如果您的需求发生变化,应该很容易修改。
答案 1 :(得分:0)
我建议在这种情况下单独导入数据和标题。
首先,我们导入数据:
PROC IMPORT OUT= want
DATAFILE= "C:\have.csv"
DBMS=CSV REPLACE;
getnames=no;
datarow=2;
RUN;
然后我们只导入带变量名称的第一行:
options obs=1;
PROC IMPORT OUT= header
DATAFILE= "C:\have.csv"
DBMS=CSV REPLACE;
getnames=no;
RUN;
options obs=max;
然后我们将带有标题的行转换为列并将“掩码”转换为非法(作为SAS名称)值 - 添加字母(无论哪一个,我选择'D')作为第一个字符并替换所有斜杠'/ '强调'_':
proc transpose data=header out=header(drop=_name_);var _all_;run;
data header;
set header;
if anydigit(substr(COL1,1,1)) then COL1=cats("D",COL1);
COL1=translate(COL1,"_","/");
run;
将这个新的“已清理”列名称放入宏变量:
proc sql noprint;
select COL1 into :names separated by ' '
from header;
quit;
使用CALL EXECUTE例程生成重命名的DATA步骤:
data _null_;
dsid=open("want","i");
num=attrn(dsid,"nvars");
call execute("data want;");
call execute("set want;");
call execute("rename");
do i=1 to num;
call execute(varname(dsid,i)||"="||scan("&names",i," "));
end;
call execute(";run;");
rc=close(dsid);
run;
现在你原来的SORT和TRANSPOSE:
PROC SORT DATA=want OUT=want; BY SEDOL;RUN;
proc transpose data=want out=transp;
by SEDOL;
run;
最后'揭开'那些日期(删除第一个D并将_替换为/),并用INPUT()将它们转换为实际日期。添加RETAIN语句只是为了将新变量DATE放在SEDOl之后的第二个位置。
data transp;
retain SEDOL date;
set transp;
substr(_name_,1,1)='';
_name_=translate(_name_,"/","_");
date=input(strip(_name_),ddmmyy10.);
drop _name_;
format date ddmmyy10.;
run;