SAS:使用PROC IMPORT导入.xlsx时定义类型

时间:2016-08-17 17:52:46

标签: excel types import sas

问题:如何在使用PROC IMPORT时定义从.xlsx文件导入的变量的变量类型?

我的工作

我正在使用SAS v9.4。据我所知,它是vanilla SAS。我没有SAS / ACCESS等。

我的数据如下:

ID1        ID2  MONTH   YEAR    QTR VAR1    VAR2
ABC_1234   1    1       2010    1   869     3988
ABC_1235   12   2       2010    1   639     3144
ABC_1236   13   3       2010    2   698     3714
ABC_1237   45   4       2010    2   630     3213

我正在运行的程序是:

proc import out=rawdata
    datafile = "c:\rawdata.xlsx"
        dbms = xlsx replace;

    format ID1 $9. ;
    format ID2 $3. ;
    format MONTH best2. ;
    format YEAR best4. ;
    format QTR best1. ;
    format VAR1 best3. ;
    format VAR2 best4. ;
run;

当我运行此步骤时,我得到以下日志输出:

  

错误:您正在尝试在数据集WORK.RAWDATA中使用字符格式$和数字变量ID2。

这似乎告诉我的是SAS自动分配变量类型。我希望能够手动控制它。我找不到解释如何执行此操作的文档。 INFORMAT,LENGTH和INPUT语句似乎不适用于PROC IMPORT。

我正在使用PROC IMPORT,因为它整体上使用.xlsx文件取得了最大的成功。我能想到的两种可能的解决方案是1)将.xlsx转换为.csv并在DATA步骤中使用INFILE,2)将数据作为数字输入并在后面的步骤中将其转换为字符。我不喜欢第一个解决方案,因为它需要我手动操作数据,这是潜在的错误来源(例如删除前导零)。我不喜欢第二种,因为它可能会无意中引入错误(再次,例如使用前导零)并引入无关的工作。

3 个答案:

答案 0 :(得分:5)

您可以尝试将列类型设置为" Text"在Excel中查看SAS是否会从中确定它。值得一试。

如果这不起作用,除非您使用PC文件服务器,或者在同一SAS服务器上安装相同位的Excel以便直接访问该文件,您将需要使用单独的数据步骤进行转换列。

proc import 
    file = "c:\rawdata.xlsx"
    out=_rawdata(rename=(ID2 = _ID2) )
    dbms = xlsx replace;
run;

data rawdata;
    format ID1 $9. ;
    format ID2 $3. ;
    format MONTH best2. ;
    format YEAR best4. ;
    format QTR best1. ;
    format VAR1 best3. ;
    format VAR2 best4. ;

    set _rawdata;

    ID2 = cats(_ID2);

    drop _:;
run;

如果您拥有SAS / Access to Excel,则可以使用DBDSOPTS data set option直接控制这些变量。例如:

libname myxlsx Excel 'C:\rawdata.xlsx';

data rawdata;
    set myxlsx.'Sheet1$'n(DBDSOPTS="DBTYPE=(ID2='CHAR(3)')");
run;

问题发生的原因是因为xlsx中的proc import引擎是SAS的内部引擎,并且与Excel引擎分开。 Excel引擎使用Microsoft Jet或Ace,而xlsx引擎使用的专有系统没有微软那么多的控制权。为什么会这样,我不知道。

运行proc import时,SAS会尝试猜测它应该是什么格式(使用guessingrows选项可以控制xls文件)。如果它检测到所有数字,它将采用数字变量。遗憾的是,如果未安装SAS / ACCESS到Excel或PC文件服务器,则无法直接控制变量类型。

答案 1 :(得分:0)

在Excel中定义类型。

如果您想稍后转换它,请使用数据步骤转换列。

$scope.variable = hoy;

答案 2 :(得分:0)

我通过不使用 PROC IMPORT 解决了这个问题。它不是适合所有人的解决方案,但它对我的目的(即不是“大数据”)非常有效。如果您正在阅读 Excel 电子表格,它应该适合您。

ImportDataFile 是一个宏1,它自动执行数据步骤导入。数据步骤导入需要一个 LENGTH 语句来定义变量名称和类型,一个 INPUT 语句从外部文件读取原始数据,以及一个 INFILE 语句来指定哪个文件。

data &dataset.;
  &infileStatement.;

  length &lengthStatement. ;

  input (_all_) (:) ;
run;

宏由三个主要步骤组成:

  • 如有必要,建立 DDE link(即连接到 Excel)
  • 通过读取标题获取数据变量
  • 读入剩余数据

注意其中的每一个如何对应于数据步骤中的三行。宏中的所有内容都支持该数据步骤。

根据我的经验,最好将数据作为固定宽度字符导入,然后在单独的步骤中转换为所需的任何类型。是的,这是多余的,但我从未遇到过内存或空间问题。好处远远超过了任何假设的担忧。它使每个分析的数据流相同,从而有助于验证并通过避免需要更正 SAS 对类型的猜测(以及不可避免的无声截断)来帮助验证并节省总体时间。

因为 SAS 是一种非常冗长的语言,所以这个答案违反了 StackOverflow 答案的字符限制。这里有一份完整的文档副本:https://pastebin.com/raw/RsXz3juJ 将代码放入名为 ImportDataFile.sas 之类的文件中,并确保它在调用宏之前运行(可能使用 %include)。调用形式为:

%ImportDataFile(   
       dirData=    
  ,   fileName=    
  ,    dataset=    
  ,  delimiter=    
  , overOption=    
  ,  headerRow=    
  ,      sheet=    
  ,      range=    
  ,     prefix=    
  ,       case=    
  ,  defLength=    
);                       

哪里

Output(s)     : SAS dataset, macro variable &listHeader                 
Inputs        :    dirData= Directory containing data file.             
                  fileName= Filename including file extension. Must be  
                            .csv, .txt, .tsv, .xls, or .xlsx.           
                   dataset= Name of dataset output to WORK library.     
                 delimiter= (optional) Delimiting string given in       
                            quotes. Default for CSV is a comma, for     
                            TXT/TSV a tab. This parameter may not be    
                            set for Excel files. Doing so generates a   
                            warning.                                    
                overOption= (optional) INFILE option. Default is        
                             MISSOVER.  Other choices are FLOWOVER,     
                             STOPOVER, TRUNCOVER, or SCANOVER.          
                 headerRow= (optional) Row corresponding to header in   
                            an Excel file. Accepts R#C#:R#C#, but       
                            should be given as R#. Default is R1.       
                     sheet= Name of worksheet. Required for XLS or XLSX.
                     range= Range of spreadsheet to be imported.        
                            Required for XLS and XLSX. Use form         
                            R#C#:R#C#.  See example below.              
                    prefix= (optional) String to append to beginning of 
                            each variable name. Default is no prefix.   
                      case= (optional) Toggle mix case variable naming. 
                            Must be lower/upper/mixed. Default is       
                            lower.                                      
                 defLength= (optional) Character field length.  Default 
                            value is 100.                               

例如,以下内容从位于 xl_importmy_xl_file.xlsx 创建一个名为 C:\Path\To\File 的字符类型的数据集,宽度为 100。列以字符串“raw_”为前缀。 overOption 对应于 INFILE 语句中定义的那些。

%ImportDataFile(              
       dirData= C:\Path\To\File
  ,   fileName= my_xl_file.xlsx      
  ,    dataset= xl_import     
  ,     prefix= raw_          
  ,      sheet= Sheet1     
  ,      range= R2C1:R13C18   
  ,  defLength= 100           
  , overOption= MISSOVER      
);                            

这是宏的代码。享受。

********************************************************************
** Utilities / Sub Macros
********************************************************************;
%macro ClearFileRef(fileRef);
  filename &fileRef. clear;
%mend;

%macro CompareVariablesToDDERange();
  %local columnIndex numberOfDDEColumns;

  %let columnIndex        = %eval(%sysfunc(findc(&range., 'C', ib)) + 1);
  %let numberOfDDEColumns = %sysfunc(substr(&range., &columnIndex));
  %if %ListLength(&listHeader) ^= &numberOfDDEColumns %then
    %put WARNING: [MACRO] Data file contains %ListLength(&listHeader) variables. RANGE argument has &numberOfDDEColumns columns.;
%mend;

%macro EstablishSystemLink(fileRef);
  filename &fileRef. dde 'excel|system';
%mend;

%macro EstablishWorkbookLink(fileRef, dirData, fileName, sheetName, range);
  filename &fileRef. dde "excel|&dirData.\[&fileName.]&sheetName.!&range.";
%mend;

%macro IsEmpty(macroVariable);
  %sysevalf(%superq(&macroVariable)=, boolean)
%mend;

%macro IsFileRef(reference);
  %local fileRefExists externalFileExists returnValue;

  %let fileRefExists      = %sysfunc(fexist(&reference.));
  %let externalFileExists = %sysfunc(fileexist(&reference.));
  %if &fileRefExists. = 1 and &externalFileExists. = 0 %then %let returnValue = 1;
  %else %let returnValue = 0;
  &returnValue
%mend;

%macro IsFilePath(reference);
  %local fileRefExists externalFileExists returnValue;

  %let fileRefExists      = %sysfunc(fexist(&reference.));
  %let externalFileExists = %sysfunc(fileexist(&reference.));
  %if &fileRefExists. = 0 and &externalFileExists. = 1 %then %let returnValue = 1;
  %else %let returnValue = 0;
  &returnValue
%mend;

%macro GetObsCount(dataset);
  %local exists returnValue closed;

  %let exists = %sysfunc(open(&dataset));
  %if &exists. %then %do;
    %let returnValue  = %sysfunc(attrn(&exists, nobs));
    %let closed       = %sysfunc(close(&exists));
    %end;
  %else %do;
    %put ERROR: [&SYSMACRONAME.] Dataset %upcase(&dataset) does not exist.;
    %abort cancel;
    %end;
  &returnValue
%mend;

%macro GetVarCount(dataset);
  %local exists varCount closed;

  %let exists = %sysfunc(open(&dataset));
  %if &exists. %then %do;
    %let varCount = %sysfunc(attrn(&exists, nvars));
    %let closed   = %sysfunc(close(&exists));
    %end;
  %else %do;
    %put ERROR: [&SYSMACRONAME.] Dataset %upcase(&dataset) does not exist.;
    %abort cancel;
    %end;
  &varCount
%mend;

%macro ListLength(list);
  %local count;

  %if %sysevalf(%superq(list)=, boolean) %then %let count = 0;
  %else %let count = %eval(%sysfunc(countc(&list., |)) + 1);
  &count
%mend;

%macro ListElement(list, n);
  %local nthElement;

  %let nthElement = %sysfunc(scan(%superq(&list.), &n., |, m));
  &nthElement
%mend;

%macro RemoveAllFormattingFromSheet(fileRef, sheet);
  data _null_;
    file &fileRef.;
    /* Select sheet of interest */
    put "[WORKBOOK.ACTIVATE(""&sheet."")]";
    /* Select first cell */
    put '[FORMULA.GOTO("R1C1")]';
    /* Apply dummy filter of ">2" to first column */
    put '[FILTER(1, ">2")]';
    /* Disable filters */
    put '[FILTER()]';
    /* Select all */
    put '[SELECT("R[0]C[0]:R[1048575]C[16383]", "R[0]C[0]")]';
    /* Unhide rows */
    put '[ROW.HEIGHT(,,TRUE, 2)]';
    /* Unhide columns */
    put '[COLUMN.WIDTH(,,TRUE, 2)]';
    /* Remove all formatting */
    put '[CLEAR(2)]';
    /* Autofit column width */
    put '[COLUMN.WIDTH(,,TRUE, 3)]';
  run;
%mend;

%macro SetSystemOptions(opt1, opt2, opt3);
  options &opt1. &opt2. &opt3.;
%mend;

%macro ImportDataFile(dirData=, fileName=, dataset=, delimiter=, overOption=MISSOVER, headerRow=R1, sheet=, range=, prefix=, case=lower, defLength=100) / minoperator mindelimiter=',';
%put NOTE: [MACRO] Executing: ImportDataFile(dirData=&dirData, fileName=&fileName, dataset=&dataset, delimiter=&delimiter, overOption=&overOption, headerRow=&headerRow, sheet=&sheet, range=&range, prefix=&prefix, case=&case, defLength=&defLength);

  %local
    macroStart
    case
    extension
    HeaderRef
    lengthStatement
    delimiter
    InfileRef
    infileStatement
    numberOfRecords
    numberOfVars
    duration
   ;

  %global
    listHeader
    originalNOTES
    originalQUOTELENMAX
  ;

  %let macroStart           = %sysfunc(datetime());
  %let originalNOTES        = %sysfunc(getoption(notes));
  %let originalQUOTELENMAX  = %sysfunc(getoption(noquotelenmax));

  %SetSystemOptions(nonotes);

********************************************************************
** Validation
********************************************************************;
  %if %IsEmpty(dirData) %then %do;
    %put ERROR: [&SYSMACRONAME.] DIRDATA argument is blank.;
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

  %if %IsEmpty(fileName) %then %do;
    %put ERROR: [&SYSMACRONAME.] FILENAME argument is blank.;
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

  %if %IsEmpty(dataset) %then %do;
    %put ERROR: [&SYSMACRONAME.] DATASET argument is blank.;
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

  %if not(%IsEmpty(prefix)) and not(%sysfunc(nvalid(&prefix, v7))) %then %do;
    %put ERROR: [&SYSMACRONAME.] Invalid PREFIX="&prefix.";
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

  %let case = %upcase(&case.);

  %if not(&case. in (LOWER, UPPER, MIXED)) %then %do;
    %put ERROR: [&SYSMACRONAME.] Invalid case option: &case. Must be LOWER, UPPER, MIX.;
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

  %let extension  = %upcase(%scan(&fileName., 1, '.', b));

  %if not(&extension. in (TXT, TSV, CSV, XLS, XLSX)) %then %do;
    %put ERROR: [&SYSMACRONAME.] Invalid file type: &extension. Must be TXT, TSV, CSV, XLS, XLSX.;
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

  %if &extension. in (XLS, XLSX) and %IsEmpty(sheet) %then %do;
    %put ERROR: [&SYSMACRONAME.] SHEET argument undefined.;
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

  %if &extension. in (XLS, XLSX) and %IsEmpty(range) %then %do;
    %put ERROR: [&SYSMACRONAME.] RANGE argument undefined.;
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

  %if not(&extension. in (XLS, XLSX)) and not(%IsEmpty(sheet)) %then %do;
    %put ERROR: [&SYSMACRONAME.] SHEET argument only valid for XLS or XLSX files.;
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

  %if not(&extension. in (XLS, XLSX)) and not(%IsEmpty(range)) %then %do;
    %put ERROR: [&SYSMACRONAME.] RANGE argument only valid for XLS or XLSX files.;
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

**********************************
*** Define delimiter
**********************************;
 %if %IsEmpty(delimiter) %then %do;
    %if       &extension. in (XLS, XLSX)  %then %let delimiter = '09'x;
    %else %if &extension. = CSV           %then %let delimiter = ',';
    %else %if &extension. in (TXT, TSV)   %then %let delimiter = '09'x;
    %else %do;
      %put ERROR: [&SYSMACRONAME.] Delimiter error.;
      %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
      %abort cancel;
      %end;
    %end;

  %if &extension. in (XLS, XLSX) and &delimiter ^= '09'x %then %do;
    %put WARNING: [&SYSMACRONAME.] Delimiter for Excel file must be '09'x.;
    %put WARNING: [&SYSMACRONAME.] Delimiter set to '09'x.;
    %let delimiter = '09'x;
    %end;

********************************************************************
** Prep Excel Worksheet
********************************************************************;
  %if &extension. in (XLS, XLSX) %then %do;
    %let DDECommandRef = DDEcmd;
    %EstablishDDELink(fileRef=&DDECommandRef.);
    %RemoveAllFormattingFromSheet(fileRef=&DDECommandRef., sheet=&sheet.);
    %end;

********************************************************************
** Get header
********************************************************************;

**********************************
*** Define file reference
**********************************;
  %if &extension. in (XLS, XLSX) %then %do;
    %let HeaderRef = DDEHead;
    %EstablishDDELink(
      fileRef= &HeaderRef.
      ,   dirData= &dirData.
      ,  fileName= &fileName.
      , sheetName= &sheet.
      ,     range= &headerRow.
    );
    %end;
  %else %if &extension. in (CSV, TXT, TSV) %then
    %let HeaderRef = %sysfunc(dequote(&dirData.))\&fileName.;

  %ReadHeaderIntoList(reference=&HeaderRef., delimiter=&delimiter., prefix=&prefix., case=&case.);

********************************************************************
** Create length statement
********************************************************************;
  %let lengthStatement = %CreateLengthStatement(&listHeader., &defLength.);

********************************************************************
** Import data
********************************************************************;

**********************************
*** Define infile statement
**********************************;
  %if &extension. in (XLS, XLSX) %then %do;
    %let InfileRef = DDESheet;
    %EstablishDDELink(
      fileRef= &InfileRef.
      ,   dirData= &dirData.
      ,  fileName= &fileName.
      , sheetName= &sheet.
      ,     range= &range.
    );
    %let infileStatement = infile &InfileRef. dlmstr=&delimiter. dsd notab &overOption.;
    %CompareVariablesToDDERange();
    %end;
  %else %if &extension. in (CSV, TXT, TSV) %then %do;
    %let InfileRef       = %sysfunc(dequote(&dirData.))\&fileName.;
    %let infileStatement = infile "&InfileRef." dlmstr=&delimiter. dsd &overOption. firstobs = 2 end=last_record;
    %end;

**********************************
*** Perform import
**********************************;
  data &dataset.;
    &infileStatement.;

    length &lengthStatement. ;

    input (_all_) (:) ;

  run;

********************************************************************
** Housekeeping
********************************************************************;
  %let numberOfRecords = %GetObsCount(&dataset.);
  %let numberOfVars    = %GetVarCount(&dataset.);

  %SetSystemOptions(notes);

  %put;
  %put NOTE: [MACRO] The dataset WORK.%upcase(&dataset.) has &numberOfRecords. observations and &numberOfVars. variables.;
  %put NOTE: [MACRO] IMPORTDATAFILE macro used (Total process time):;

  %let duration = %sysfunc(putn(%sysevalf(%sysfunc(datetime()) - &macroStart.), time12.3));
  %if %sysfunc(minute("&duration."t)) > 0 %then %do;
    %put NO%str(TE-)         real time            %substr(&duration., 3, 8);
    %end;
  %else %do;
    %put NO%str(TE-)         real time            %substr(&duration., 6, 5) seconds;
    %end;

  %put;

  %SetSystemOptions(&originalNotes., &originalQUOTELENMAX.);

%mend;

%macro  EstablishDDELink(fileRef, dirData, fileName, sheetName, range);
%put NOTE: [&SYSMACRONAME] Executing: EstablishDDELink(fileRef=&fileRef, dirData=&dirData, fileName=&fileName, sheetName=&sheetName, range=&range);

  %local dirData linkConnection stopTime closeReturnCode;

********************************************************************
** Validate arguments
********************************************************************;
  %if %IsEmpty(fileRef) %then %do;
    %put ERROR: [&SYSMACRONAME] fileRef is blank.;
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

  %if %length(&fileRef.) > 8 %then %do;
    %put ERROR: [&SYSMACRONAME] Fileref &fileRef exceeds 8 character limit.;
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

  %if not %IsEmpty(dirData) %then %let dirData = %sysfunc(dequote(&dirData.));

********************************************************************
** Assign fileref according to link type
********************************************************************;
  %if     %IsEmpty(dirData)
      and %IsEmpty(fileName)
      and %IsEmpty(sheetName)
      and %IsEmpty(range) %then %EstablishSystemLink(&fileRef.);
  %else %EstablishWorkbookLink(&fileRef., &dirData., &fileName., &sheetName., &range.);

********************************************************************
** Check that link has been established
********************************************************************;
  %let linkConnection = %sysfunc(fopen(&fileRef, S));

  %if not (&linkConnection. > 0) %then %do;

    /*Run until either Excel opens (linkConnection > 0)
      or until 10 seconds have passed.*/
    %let stopTime = %sysevalf(%sysfunc(datetime()) + 10);

    %do %until (&linkConnection. > 0);
      %if (%sysfunc(datetime()) >= &stopTime.) %then %do;
    %put ERROR: [&SYSMACRONAME] DDE system link was not established. Operation timed out.;
    %ClearFileRef(fileRef.);
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

      %let linkConnection = %sysfunc(fopen(&fileRef, S));
      %end;
    %end;

********************************************************************
** Housekeeping
********************************************************************;
  %let closeReturnCode = %sysfunc(fclose(&linkConnection));

%mend;

%macro  ReadHeaderIntoList(reference, delimiter, prefix, case) / minoperator mindelimiter=',';
%put NOTE: [MACRO] Executing: ReadHeaderIntoList(reference=&reference, delimiter=&delimiter, prefix=&prefix, case=&case);

  %local  fileSpecification notab delimiter;
  %global listHeader;

  %SetSystemOptions(nonotes);

  %if %IsEmpty(reference) %then %do;
    %put ERROR: [&SYSMACRONAME.] REFERENCE argument is blank.;
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

********************************************************************
** Determine infile statement options
********************************************************************;
  /*SAS filerefs exist only for Excel files*/
  %if       %IsFileRef(&reference.)  %then %do;
    %let fileSpecification  = &reference.;
    %let notab              = notab;
    %end;
  /*Absolute references only for CSV,TXT,TSV files*/
  %else %if %IsFilePath(&reference.) %then %do;
    %let fileSpecification  = "&reference.";
    %let notab              = ;
    %let extension          = %upcase(%scan(&reference., 1, '.', b));
    %end;
  %else %do;
    %put ERROR: [&SYSMACRONAME.] Invalid input REFERENCE: [&reference.];
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

********************************************************************
** Read in header
********************************************************************;
  data _null_;
    infile &fileSpecification. dlmstr = '```#@' &notab. obs = 1 lrecl = 32767 ;
    length
      raw_header_line   $ 32767
      raw_with_pipes    $ 32767
    ;
    input raw_header_line;

    raw_with_pipes  = tranwrd(raw_header_line, &delimiter., '|');
    call symput('rawListHeader', strip(raw_with_pipes));
  run;

********************************************************************
** Transform headers into valid variable names
********************************************************************;
  %SetSystemOptions(noquotelenmax);
  data _null_;
    length
      i           8
      listLength  8
      header_i    $ 32767
      temp_i      $ 32767
      listValid   $ 32767
    ;
    listLength = %ListLength(%superq(rawListHeader));

    do i = 1 to listLength;
      header_i = scan("%superq(rawListHeader)", i, '|', 'm');

**********************************
*** Apply prefix
**********************************;
      if not missing(header_i) then prefixed_i = cats("&prefix.", header_i);
      else                          prefixed_i = header_i;

**********************************
*** Apply case
**********************************;
      if      "&case." = "LOWER" then cased_i = lowcase(prefixed_i);
      else if "&case." = "UPPER" then cased_i = upcase(prefixed_i);
      else                            cased_i = prefixed_i;

**********************************
*** Keep valid otherwise correct
**********************************;
      if nvalid(cased_i, 'v7') then do;
    if i = 1 then listValid = cased_i;
    else          listValid = catx('|', listValid, cased_i);
    end;
      else do;

**********************************
*** Fill in blank headers
**********************************;
      if missing(cased_i) and "&case." = "UPPER" then temp_i = "%upcase(&prefix.)NO_HEADER";
      else if missing(cased_i)                   then temp_i = "&prefix.no_header";

**********************************
*** Replace blanks with _ and
*** Remove invalid characters
**********************************;
      else do;
    replaced_space_with_underscore = tranwrd(strip(cased_i), ' ', '_');
    temp_i = compress(replaced_space_with_underscore, '_', 'kin');
    end;

**********************************
*** Make first char _ if digit
**********************************;
    if anydigit(temp_i) = 1 then temp_i = cats('_', temp_i);

**********************************
*** Trim length to 32
**********************************;
    if length(temp_i) > 32 then temp_i = substr(temp_i, 1, 32);

**********************************
*** Verify valid V7 name
**********************************;
    if not nvalid(temp_i, 'v7') then do;
      put 'ERROR: [&SYSMACRONAME.] Error cleaning header ' i +(-1) '. Invalid SAS name.';
      call execute('
        %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
        data _null_;
          abort cancel nolist;
        run;');
      stop;
      end;

    if i = 1 then listValid = temp_i;
    else          listValid = catx('|', listValid, temp_i);
    end;

      output;
    end;
    call symput('listValid', strip(listValid));
  run;

********************************************************************
** Append repeated headers with incremented value
********************************************************************;
  /*Use hash table with key being each header and value
    corresponding to the number of occurences.  Create new
    header list as follows: If first occurence of a header,
    add to list.  If not first occurence, ruthlessly append
    occurence number (ensuring validity) and add to list.
    Beware: SAS documentation for hashes contains syntax
    errors.*/
  data _null_;
    length
      element_i   $ 32
      item        $ 32
      occurrences 8
      new_list    $ 32767
    ;

    declare hash h();
    h.defineKey('item');
    h.defineData('item', 'occurrences');
    h.defineDone();
    call missing(item, occurrences);

    listLength = input("%ListLength(&listValid.)", 8.);
    do i = 1 to listLength;
      element_i = scan("&listValid.", i, '|');

      if not (h.find(key: element_i) = 0) then do;
    h.add(key: element_i, data: element_i, data: 1);
    new_list = catx('|', new_list, element_i);
    end;
      else do;
    occurrences + 1;
    h.replace(key: element_i, data: element_i, data: occurrences);

    len     = length(element_i);
    digits  = ceil(log10(occurrences + 1));

    if (len + digits) > 32 then
      new_element = cats(substr(element_i, 1, len - digits), occurrences);
    else new_element = cats(element_i, occurrences);

    new_list = catx('|', new_list, new_element);
    end;
    end;

    call symput('listHeader', strip(new_list));
  run;
%mend;

%macro  CreateLengthStatement(listHeader, defLength);
  %local lengthStatement header_h;

  %let lengthStatement=;
  %do h = 1 %to %ListLength(&listHeader.);
  %let header_h = %ListElement(listHeader, &h);
    %if &h. = 1 %then %let lengthStatement = &header_h. $ &defLength. ;
    %else %let lengthStatement = &lengthStatement. &header_h. $ &defLength. ;
  %end;
  %let lengthStatement = &lengthStatement;
  &lengthStatement
%mend;

1 该解决方案广泛使用宏。根据我的经验,人们建议我避免使用宏。恕我直言,我发现最好忽略该建议。 SAS 没有函数,这使得开发抽象变得困难。宏允许您模仿函数。对宏的常见恐惧是调试。坚持使用 Single Responsibility Principle,您会发现它们根本不难调试。用 %put 语句记录它们,您就会知道谁被调用以及何时被调用。如果您不熟悉宏,它们实际上只是文本替换。代码经过预处理器并用文本替换宏代码。然后执行该文本和其余代码。了解宏的最佳资源是 the manual