迭代多个txt文件并为SAS中的每个文件创建一个新数据集

时间:2013-10-09 05:40:13

标签: macros import sas do-loops datastep

我遇到了SAS的问题。我在个别txt文件中有一堆月度天气数据。我目前的目标是阅读这些内容并为每个数据集创建单独的数据集。或者,我可以看到可以跳过此步骤并更接近于按日期和时间将所有这些数据集合并到另一个数据集的最终目标。以下是我对这个问题的尝试。我认为一个宏可以工作,遍历文件名并创建匹配的数据集名称,但显然它没有。另外,为了使if / else if语句更有效,我认为可以用DO循环代替,但我无法弄明白。非常感谢帮助!

%macro loop; 
%do i = 11 %to 13; 
%do j = 01 %to 12; 
    %let year = i; 
    %let month = j;
    data _&year&month ; 
        infile "&path\hr_pit_&year..&month..txt" firstobs=27;  
        length Time $ 4 Month $ 3 Day $ 2 Year $ 4 temp 3; 
        input time $ Month $ 10-13 Day Year temp 32-34; 
        Date = Day||Month||Year;
        if time = '12AM' then time = 2400;
        else if time = '1AM ' then time = 100; 
        else if time = '2AM ' then time = 200; 
        else if time = '3AM ' then time = 300; 
        else if time = '4AM ' then time = 400; 
        else if time = '5AM ' then time = 500; 
        else if time = '6AM ' then time = 600; 
        else if time = '7AM ' then time = 700; 
        else if time = '8AM ' then time = 800; 
        else if time = '9AM ' then time = 900; 
        else if time = '10AM' then time = 1000;
        else if time = '11AM' then time = 1100; 
        else if time = '12PM' then time = 1200;
        else if time = '1PM ' then time = 1300;
        else if time = '2PM ' then time = 1400;
        else if time = '3PM ' then time = 1500;
        else if time = '4PM ' then time = 1600;
        else if time = '5PM ' then time = 1700;
        else if time = '6PM ' then time = 1800;
        else if time = '7PM ' then time = 1900;
        else if time = '8PM ' then time = 2000;
        else if time = '9PM ' then time = 2100;
        else if time = '10PM' then time = 2200;
        else if time = '11PM' then time = 2300;
        _time = input(time,4.);
        time = _time; 
        drop month day year; 
    run; 
%end; 
%end; 
%mend; 

%loop; run: 

如果有人想知道这是典型的txt文件的外观:http://www.erh.noaa.gov/pbz/hourlywx/hr_pit_13.01

以下是相同形状和形式的txt文件列表: http://www.erh.noaa.gov/pbz/hourlyclimate.htm

2 个答案:

答案 0 :(得分:2)

首先修复:

%let year = &i; 
%let month = %sysfunc(putn(&j, z2.));

使用宏变量并将前导零添加到月份。 其余的变化只是处理AM / PM。 此日期现在也是数字。

完整代码:

%macro loop; 
%do i = 11 %to 13; 
%do j = 1 %to 12; 
    %let year = &i; 
    %let month = %sysfunc(putn(&j, z2.));
    data _&year&month ;
        length Date 5 _Time $4 Time 8 Month $3 Day $2 Year $4 temp 3; 
          format Date DATE9.; 
        infile "&path\hr_pit_&year..&month..txt" firstobs=27;  

    input _time $ Month $ 10-13 Day Year temp 32-34; 
    _time = right(_time);
    Date = input(Day||Month||Year, date9.);
    if _time = '12AM' or (_time ne '12PM' and index(_time, 'PM') > 1 )
            then time=input(_time, 2.) + 12;
    else time=input(_time, 2.);
    time = time * 100;
    drop month day year;
run; 
     /* gather all data in one table */
    proc append base=work.all_data data=work._&year&month;
    run;
%end; 
%end; 
%mend; 


proc sql;
drop table work.all_data;
quit;
%let path=E:;
%loop; 

答案 1 :(得分:0)

听起来最好的答案可能是将它们全部读入一个数据集,然后将它们合并到那里的最终数据集中。我认为通过使用实时值而不是100-2400(以及不一致的2400,如果你这样做真的应该是000),你也会得到更好的服务 - 然后你可以使用input。< / p>

无论如何,如果您只是这样阅读文本文件:

data my_text_files;
infile "c:\mydirectory\*.txt" lrecl=whatever eov=eovmark;
*firstobs=27 is only respected for the first file - so we have to track with eovmark;
if eovmark then do;
  eovmark=0;
  linecounter=0;
end;
linecounter+1;
if linecounter ge 27 then do;
  input (input statement);
  (any other code you want to execute here);
  output;
end;
run;

然后合并(无论如何)。如果您需要了解有关文件名的一些信息,可以使用filename选项访问infile语句中的信息。