问题:如何在使用PROC IMPORT时定义从.xlsx文件导入的变量的变量类型?
我的工作
我正在使用SAS v9.4。据我所知,它是vanilla SAS。我没有SAS / ACCESS等。
我的数据如下:
ID1 ID2 MONTH YEAR QTR VAR1 VAR2
ABC_1234 1 1 2010 1 869 3988
ABC_1235 12 2 2010 1 639 3144
ABC_1236 13 3 2010 2 698 3714
ABC_1237 45 4 2010 2 630 3213
我正在运行的程序是:
proc import out=rawdata
datafile = "c:\rawdata.xlsx"
dbms = xlsx replace;
format ID1 $9. ;
format ID2 $3. ;
format MONTH best2. ;
format YEAR best4. ;
format QTR best1. ;
format VAR1 best3. ;
format VAR2 best4. ;
run;
当我运行此步骤时,我得到以下日志输出:
错误:您正在尝试在数据集WORK.RAWDATA中使用字符格式$和数字变量ID2。
这似乎告诉我的是SAS自动分配变量类型。我希望能够手动控制它。我找不到解释如何执行此操作的文档。 INFORMAT,LENGTH和INPUT语句似乎不适用于PROC IMPORT。
我正在使用PROC IMPORT,因为它整体上使用.xlsx文件取得了最大的成功。我能想到的两种可能的解决方案是1)将.xlsx转换为.csv并在DATA步骤中使用INFILE,2)将数据作为数字输入并在后面的步骤中将其转换为字符。我不喜欢第一个解决方案,因为它需要我手动操作数据,这是潜在的错误来源(例如删除前导零)。我不喜欢第二种,因为它可能会无意中引入错误(再次,例如使用前导零)并引入无关的工作。
答案 0 :(得分:5)
您可以尝试将列类型设置为" Text"在Excel中查看SAS是否会从中确定它。值得一试。
如果这不起作用,除非您使用PC文件服务器,或者在同一SAS服务器上安装相同位的Excel以便直接访问该文件,您将需要使用单独的数据步骤进行转换列。
proc import
file = "c:\rawdata.xlsx"
out=_rawdata(rename=(ID2 = _ID2) )
dbms = xlsx replace;
run;
data rawdata;
format ID1 $9. ;
format ID2 $3. ;
format MONTH best2. ;
format YEAR best4. ;
format QTR best1. ;
format VAR1 best3. ;
format VAR2 best4. ;
set _rawdata;
ID2 = cats(_ID2);
drop _:;
run;
如果您拥有SAS / Access to Excel,则可以使用DBDSOPTS data set option直接控制这些变量。例如:
libname myxlsx Excel 'C:\rawdata.xlsx';
data rawdata;
set myxlsx.'Sheet1$'n(DBDSOPTS="DBTYPE=(ID2='CHAR(3)')");
run;
问题发生的原因是因为xlsx
中的proc import
引擎是SAS的内部引擎,并且与Excel
引擎分开。 Excel
引擎使用Microsoft Jet或Ace,而xlsx
引擎使用的专有系统没有微软那么多的控制权。为什么会这样,我不知道。
运行proc import
时,SAS会尝试猜测它应该是什么格式(使用guessingrows
选项可以控制xls文件)。如果它检测到所有数字,它将采用数字变量。遗憾的是,如果未安装SAS / ACCESS到Excel或PC文件服务器,则无法直接控制变量类型。
答案 1 :(得分:0)
在Excel中定义类型。
如果您想稍后转换它,请使用数据步骤转换列。
$scope.variable = hoy;
答案 2 :(得分:0)
我通过不使用 PROC IMPORT
解决了这个问题。它不是适合所有人的解决方案,但它对我的目的(即不是“大数据”)非常有效。如果您正在阅读 Excel 电子表格,它应该适合您。
ImportDataFile
是一个宏1,它自动执行数据步骤导入。数据步骤导入需要一个 LENGTH
语句来定义变量名称和类型,一个 INPUT
语句从外部文件读取原始数据,以及一个 INFILE
语句来指定哪个文件。>
data &dataset.;
&infileStatement.;
length &lengthStatement. ;
input (_all_) (:) ;
run;
宏由三个主要步骤组成:
注意其中的每一个如何对应于数据步骤中的三行。宏中的所有内容都支持该数据步骤。
根据我的经验,最好将数据作为固定宽度字符导入,然后在单独的步骤中转换为所需的任何类型。是的,这是多余的,但我从未遇到过内存或空间问题。好处远远超过了任何假设的担忧。它使每个分析的数据流相同,从而有助于验证并通过避免需要更正 SAS 对类型的猜测(以及不可避免的无声截断)来帮助验证并节省总体时间。
因为 SAS 是一种非常冗长的语言,所以这个答案违反了 StackOverflow 答案的字符限制。这里有一份完整的文档副本:https://pastebin.com/raw/RsXz3juJ 将代码放入名为 ImportDataFile.sas
之类的文件中,并确保它在调用宏之前运行(可能使用 %include
)。调用形式为:
%ImportDataFile(
dirData=
, fileName=
, dataset=
, delimiter=
, overOption=
, headerRow=
, sheet=
, range=
, prefix=
, case=
, defLength=
);
哪里
Output(s) : SAS dataset, macro variable &listHeader
Inputs : dirData= Directory containing data file.
fileName= Filename including file extension. Must be
.csv, .txt, .tsv, .xls, or .xlsx.
dataset= Name of dataset output to WORK library.
delimiter= (optional) Delimiting string given in
quotes. Default for CSV is a comma, for
TXT/TSV a tab. This parameter may not be
set for Excel files. Doing so generates a
warning.
overOption= (optional) INFILE option. Default is
MISSOVER. Other choices are FLOWOVER,
STOPOVER, TRUNCOVER, or SCANOVER.
headerRow= (optional) Row corresponding to header in
an Excel file. Accepts R#C#:R#C#, but
should be given as R#. Default is R1.
sheet= Name of worksheet. Required for XLS or XLSX.
range= Range of spreadsheet to be imported.
Required for XLS and XLSX. Use form
R#C#:R#C#. See example below.
prefix= (optional) String to append to beginning of
each variable name. Default is no prefix.
case= (optional) Toggle mix case variable naming.
Must be lower/upper/mixed. Default is
lower.
defLength= (optional) Character field length. Default
value is 100.
例如,以下内容从位于 xl_import
的 my_xl_file.xlsx
创建一个名为 C:\Path\To\File
的字符类型的数据集,宽度为 100。列以字符串“raw_”为前缀。 overOption
对应于 INFILE
语句中定义的那些。
%ImportDataFile(
dirData= C:\Path\To\File
, fileName= my_xl_file.xlsx
, dataset= xl_import
, prefix= raw_
, sheet= Sheet1
, range= R2C1:R13C18
, defLength= 100
, overOption= MISSOVER
);
这是宏的代码。享受。
********************************************************************
** Utilities / Sub Macros
********************************************************************;
%macro ClearFileRef(fileRef);
filename &fileRef. clear;
%mend;
%macro CompareVariablesToDDERange();
%local columnIndex numberOfDDEColumns;
%let columnIndex = %eval(%sysfunc(findc(&range., 'C', ib)) + 1);
%let numberOfDDEColumns = %sysfunc(substr(&range., &columnIndex));
%if %ListLength(&listHeader) ^= &numberOfDDEColumns %then
%put WARNING: [MACRO] Data file contains %ListLength(&listHeader) variables. RANGE argument has &numberOfDDEColumns columns.;
%mend;
%macro EstablishSystemLink(fileRef);
filename &fileRef. dde 'excel|system';
%mend;
%macro EstablishWorkbookLink(fileRef, dirData, fileName, sheetName, range);
filename &fileRef. dde "excel|&dirData.\[&fileName.]&sheetName.!&range.";
%mend;
%macro IsEmpty(macroVariable);
%sysevalf(%superq(¯oVariable)=, boolean)
%mend;
%macro IsFileRef(reference);
%local fileRefExists externalFileExists returnValue;
%let fileRefExists = %sysfunc(fexist(&reference.));
%let externalFileExists = %sysfunc(fileexist(&reference.));
%if &fileRefExists. = 1 and &externalFileExists. = 0 %then %let returnValue = 1;
%else %let returnValue = 0;
&returnValue
%mend;
%macro IsFilePath(reference);
%local fileRefExists externalFileExists returnValue;
%let fileRefExists = %sysfunc(fexist(&reference.));
%let externalFileExists = %sysfunc(fileexist(&reference.));
%if &fileRefExists. = 0 and &externalFileExists. = 1 %then %let returnValue = 1;
%else %let returnValue = 0;
&returnValue
%mend;
%macro GetObsCount(dataset);
%local exists returnValue closed;
%let exists = %sysfunc(open(&dataset));
%if &exists. %then %do;
%let returnValue = %sysfunc(attrn(&exists, nobs));
%let closed = %sysfunc(close(&exists));
%end;
%else %do;
%put ERROR: [&SYSMACRONAME.] Dataset %upcase(&dataset) does not exist.;
%abort cancel;
%end;
&returnValue
%mend;
%macro GetVarCount(dataset);
%local exists varCount closed;
%let exists = %sysfunc(open(&dataset));
%if &exists. %then %do;
%let varCount = %sysfunc(attrn(&exists, nvars));
%let closed = %sysfunc(close(&exists));
%end;
%else %do;
%put ERROR: [&SYSMACRONAME.] Dataset %upcase(&dataset) does not exist.;
%abort cancel;
%end;
&varCount
%mend;
%macro ListLength(list);
%local count;
%if %sysevalf(%superq(list)=, boolean) %then %let count = 0;
%else %let count = %eval(%sysfunc(countc(&list., |)) + 1);
&count
%mend;
%macro ListElement(list, n);
%local nthElement;
%let nthElement = %sysfunc(scan(%superq(&list.), &n., |, m));
&nthElement
%mend;
%macro RemoveAllFormattingFromSheet(fileRef, sheet);
data _null_;
file &fileRef.;
/* Select sheet of interest */
put "[WORKBOOK.ACTIVATE(""&sheet."")]";
/* Select first cell */
put '[FORMULA.GOTO("R1C1")]';
/* Apply dummy filter of ">2" to first column */
put '[FILTER(1, ">2")]';
/* Disable filters */
put '[FILTER()]';
/* Select all */
put '[SELECT("R[0]C[0]:R[1048575]C[16383]", "R[0]C[0]")]';
/* Unhide rows */
put '[ROW.HEIGHT(,,TRUE, 2)]';
/* Unhide columns */
put '[COLUMN.WIDTH(,,TRUE, 2)]';
/* Remove all formatting */
put '[CLEAR(2)]';
/* Autofit column width */
put '[COLUMN.WIDTH(,,TRUE, 3)]';
run;
%mend;
%macro SetSystemOptions(opt1, opt2, opt3);
options &opt1. &opt2. &opt3.;
%mend;
%macro ImportDataFile(dirData=, fileName=, dataset=, delimiter=, overOption=MISSOVER, headerRow=R1, sheet=, range=, prefix=, case=lower, defLength=100) / minoperator mindelimiter=',';
%put NOTE: [MACRO] Executing: ImportDataFile(dirData=&dirData, fileName=&fileName, dataset=&dataset, delimiter=&delimiter, overOption=&overOption, headerRow=&headerRow, sheet=&sheet, range=&range, prefix=&prefix, case=&case, defLength=&defLength);
%local
macroStart
case
extension
HeaderRef
lengthStatement
delimiter
InfileRef
infileStatement
numberOfRecords
numberOfVars
duration
;
%global
listHeader
originalNOTES
originalQUOTELENMAX
;
%let macroStart = %sysfunc(datetime());
%let originalNOTES = %sysfunc(getoption(notes));
%let originalQUOTELENMAX = %sysfunc(getoption(noquotelenmax));
%SetSystemOptions(nonotes);
********************************************************************
** Validation
********************************************************************;
%if %IsEmpty(dirData) %then %do;
%put ERROR: [&SYSMACRONAME.] DIRDATA argument is blank.;
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
%if %IsEmpty(fileName) %then %do;
%put ERROR: [&SYSMACRONAME.] FILENAME argument is blank.;
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
%if %IsEmpty(dataset) %then %do;
%put ERROR: [&SYSMACRONAME.] DATASET argument is blank.;
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
%if not(%IsEmpty(prefix)) and not(%sysfunc(nvalid(&prefix, v7))) %then %do;
%put ERROR: [&SYSMACRONAME.] Invalid PREFIX="&prefix.";
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
%let case = %upcase(&case.);
%if not(&case. in (LOWER, UPPER, MIXED)) %then %do;
%put ERROR: [&SYSMACRONAME.] Invalid case option: &case. Must be LOWER, UPPER, MIX.;
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
%let extension = %upcase(%scan(&fileName., 1, '.', b));
%if not(&extension. in (TXT, TSV, CSV, XLS, XLSX)) %then %do;
%put ERROR: [&SYSMACRONAME.] Invalid file type: &extension. Must be TXT, TSV, CSV, XLS, XLSX.;
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
%if &extension. in (XLS, XLSX) and %IsEmpty(sheet) %then %do;
%put ERROR: [&SYSMACRONAME.] SHEET argument undefined.;
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
%if &extension. in (XLS, XLSX) and %IsEmpty(range) %then %do;
%put ERROR: [&SYSMACRONAME.] RANGE argument undefined.;
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
%if not(&extension. in (XLS, XLSX)) and not(%IsEmpty(sheet)) %then %do;
%put ERROR: [&SYSMACRONAME.] SHEET argument only valid for XLS or XLSX files.;
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
%if not(&extension. in (XLS, XLSX)) and not(%IsEmpty(range)) %then %do;
%put ERROR: [&SYSMACRONAME.] RANGE argument only valid for XLS or XLSX files.;
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
**********************************
*** Define delimiter
**********************************;
%if %IsEmpty(delimiter) %then %do;
%if &extension. in (XLS, XLSX) %then %let delimiter = '09'x;
%else %if &extension. = CSV %then %let delimiter = ',';
%else %if &extension. in (TXT, TSV) %then %let delimiter = '09'x;
%else %do;
%put ERROR: [&SYSMACRONAME.] Delimiter error.;
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
%end;
%if &extension. in (XLS, XLSX) and &delimiter ^= '09'x %then %do;
%put WARNING: [&SYSMACRONAME.] Delimiter for Excel file must be '09'x.;
%put WARNING: [&SYSMACRONAME.] Delimiter set to '09'x.;
%let delimiter = '09'x;
%end;
********************************************************************
** Prep Excel Worksheet
********************************************************************;
%if &extension. in (XLS, XLSX) %then %do;
%let DDECommandRef = DDEcmd;
%EstablishDDELink(fileRef=&DDECommandRef.);
%RemoveAllFormattingFromSheet(fileRef=&DDECommandRef., sheet=&sheet.);
%end;
********************************************************************
** Get header
********************************************************************;
**********************************
*** Define file reference
**********************************;
%if &extension. in (XLS, XLSX) %then %do;
%let HeaderRef = DDEHead;
%EstablishDDELink(
fileRef= &HeaderRef.
, dirData= &dirData.
, fileName= &fileName.
, sheetName= &sheet.
, range= &headerRow.
);
%end;
%else %if &extension. in (CSV, TXT, TSV) %then
%let HeaderRef = %sysfunc(dequote(&dirData.))\&fileName.;
%ReadHeaderIntoList(reference=&HeaderRef., delimiter=&delimiter., prefix=&prefix., case=&case.);
********************************************************************
** Create length statement
********************************************************************;
%let lengthStatement = %CreateLengthStatement(&listHeader., &defLength.);
********************************************************************
** Import data
********************************************************************;
**********************************
*** Define infile statement
**********************************;
%if &extension. in (XLS, XLSX) %then %do;
%let InfileRef = DDESheet;
%EstablishDDELink(
fileRef= &InfileRef.
, dirData= &dirData.
, fileName= &fileName.
, sheetName= &sheet.
, range= &range.
);
%let infileStatement = infile &InfileRef. dlmstr=&delimiter. dsd notab &overOption.;
%CompareVariablesToDDERange();
%end;
%else %if &extension. in (CSV, TXT, TSV) %then %do;
%let InfileRef = %sysfunc(dequote(&dirData.))\&fileName.;
%let infileStatement = infile "&InfileRef." dlmstr=&delimiter. dsd &overOption. firstobs = 2 end=last_record;
%end;
**********************************
*** Perform import
**********************************;
data &dataset.;
&infileStatement.;
length &lengthStatement. ;
input (_all_) (:) ;
run;
********************************************************************
** Housekeeping
********************************************************************;
%let numberOfRecords = %GetObsCount(&dataset.);
%let numberOfVars = %GetVarCount(&dataset.);
%SetSystemOptions(notes);
%put;
%put NOTE: [MACRO] The dataset WORK.%upcase(&dataset.) has &numberOfRecords. observations and &numberOfVars. variables.;
%put NOTE: [MACRO] IMPORTDATAFILE macro used (Total process time):;
%let duration = %sysfunc(putn(%sysevalf(%sysfunc(datetime()) - ¯oStart.), time12.3));
%if %sysfunc(minute("&duration."t)) > 0 %then %do;
%put NO%str(TE-) real time %substr(&duration., 3, 8);
%end;
%else %do;
%put NO%str(TE-) real time %substr(&duration., 6, 5) seconds;
%end;
%put;
%SetSystemOptions(&originalNotes., &originalQUOTELENMAX.);
%mend;
%macro EstablishDDELink(fileRef, dirData, fileName, sheetName, range);
%put NOTE: [&SYSMACRONAME] Executing: EstablishDDELink(fileRef=&fileRef, dirData=&dirData, fileName=&fileName, sheetName=&sheetName, range=&range);
%local dirData linkConnection stopTime closeReturnCode;
********************************************************************
** Validate arguments
********************************************************************;
%if %IsEmpty(fileRef) %then %do;
%put ERROR: [&SYSMACRONAME] fileRef is blank.;
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
%if %length(&fileRef.) > 8 %then %do;
%put ERROR: [&SYSMACRONAME] Fileref &fileRef exceeds 8 character limit.;
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
%if not %IsEmpty(dirData) %then %let dirData = %sysfunc(dequote(&dirData.));
********************************************************************
** Assign fileref according to link type
********************************************************************;
%if %IsEmpty(dirData)
and %IsEmpty(fileName)
and %IsEmpty(sheetName)
and %IsEmpty(range) %then %EstablishSystemLink(&fileRef.);
%else %EstablishWorkbookLink(&fileRef., &dirData., &fileName., &sheetName., &range.);
********************************************************************
** Check that link has been established
********************************************************************;
%let linkConnection = %sysfunc(fopen(&fileRef, S));
%if not (&linkConnection. > 0) %then %do;
/*Run until either Excel opens (linkConnection > 0)
or until 10 seconds have passed.*/
%let stopTime = %sysevalf(%sysfunc(datetime()) + 10);
%do %until (&linkConnection. > 0);
%if (%sysfunc(datetime()) >= &stopTime.) %then %do;
%put ERROR: [&SYSMACRONAME] DDE system link was not established. Operation timed out.;
%ClearFileRef(fileRef.);
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
%let linkConnection = %sysfunc(fopen(&fileRef, S));
%end;
%end;
********************************************************************
** Housekeeping
********************************************************************;
%let closeReturnCode = %sysfunc(fclose(&linkConnection));
%mend;
%macro ReadHeaderIntoList(reference, delimiter, prefix, case) / minoperator mindelimiter=',';
%put NOTE: [MACRO] Executing: ReadHeaderIntoList(reference=&reference, delimiter=&delimiter, prefix=&prefix, case=&case);
%local fileSpecification notab delimiter;
%global listHeader;
%SetSystemOptions(nonotes);
%if %IsEmpty(reference) %then %do;
%put ERROR: [&SYSMACRONAME.] REFERENCE argument is blank.;
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
********************************************************************
** Determine infile statement options
********************************************************************;
/*SAS filerefs exist only for Excel files*/
%if %IsFileRef(&reference.) %then %do;
%let fileSpecification = &reference.;
%let notab = notab;
%end;
/*Absolute references only for CSV,TXT,TSV files*/
%else %if %IsFilePath(&reference.) %then %do;
%let fileSpecification = "&reference.";
%let notab = ;
%let extension = %upcase(%scan(&reference., 1, '.', b));
%end;
%else %do;
%put ERROR: [&SYSMACRONAME.] Invalid input REFERENCE: [&reference.];
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
********************************************************************
** Read in header
********************************************************************;
data _null_;
infile &fileSpecification. dlmstr = '```#@' ¬ab. obs = 1 lrecl = 32767 ;
length
raw_header_line $ 32767
raw_with_pipes $ 32767
;
input raw_header_line;
raw_with_pipes = tranwrd(raw_header_line, &delimiter., '|');
call symput('rawListHeader', strip(raw_with_pipes));
run;
********************************************************************
** Transform headers into valid variable names
********************************************************************;
%SetSystemOptions(noquotelenmax);
data _null_;
length
i 8
listLength 8
header_i $ 32767
temp_i $ 32767
listValid $ 32767
;
listLength = %ListLength(%superq(rawListHeader));
do i = 1 to listLength;
header_i = scan("%superq(rawListHeader)", i, '|', 'm');
**********************************
*** Apply prefix
**********************************;
if not missing(header_i) then prefixed_i = cats("&prefix.", header_i);
else prefixed_i = header_i;
**********************************
*** Apply case
**********************************;
if "&case." = "LOWER" then cased_i = lowcase(prefixed_i);
else if "&case." = "UPPER" then cased_i = upcase(prefixed_i);
else cased_i = prefixed_i;
**********************************
*** Keep valid otherwise correct
**********************************;
if nvalid(cased_i, 'v7') then do;
if i = 1 then listValid = cased_i;
else listValid = catx('|', listValid, cased_i);
end;
else do;
**********************************
*** Fill in blank headers
**********************************;
if missing(cased_i) and "&case." = "UPPER" then temp_i = "%upcase(&prefix.)NO_HEADER";
else if missing(cased_i) then temp_i = "&prefix.no_header";
**********************************
*** Replace blanks with _ and
*** Remove invalid characters
**********************************;
else do;
replaced_space_with_underscore = tranwrd(strip(cased_i), ' ', '_');
temp_i = compress(replaced_space_with_underscore, '_', 'kin');
end;
**********************************
*** Make first char _ if digit
**********************************;
if anydigit(temp_i) = 1 then temp_i = cats('_', temp_i);
**********************************
*** Trim length to 32
**********************************;
if length(temp_i) > 32 then temp_i = substr(temp_i, 1, 32);
**********************************
*** Verify valid V7 name
**********************************;
if not nvalid(temp_i, 'v7') then do;
put 'ERROR: [&SYSMACRONAME.] Error cleaning header ' i +(-1) '. Invalid SAS name.';
call execute('
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
data _null_;
abort cancel nolist;
run;');
stop;
end;
if i = 1 then listValid = temp_i;
else listValid = catx('|', listValid, temp_i);
end;
output;
end;
call symput('listValid', strip(listValid));
run;
********************************************************************
** Append repeated headers with incremented value
********************************************************************;
/*Use hash table with key being each header and value
corresponding to the number of occurences. Create new
header list as follows: If first occurence of a header,
add to list. If not first occurence, ruthlessly append
occurence number (ensuring validity) and add to list.
Beware: SAS documentation for hashes contains syntax
errors.*/
data _null_;
length
element_i $ 32
item $ 32
occurrences 8
new_list $ 32767
;
declare hash h();
h.defineKey('item');
h.defineData('item', 'occurrences');
h.defineDone();
call missing(item, occurrences);
listLength = input("%ListLength(&listValid.)", 8.);
do i = 1 to listLength;
element_i = scan("&listValid.", i, '|');
if not (h.find(key: element_i) = 0) then do;
h.add(key: element_i, data: element_i, data: 1);
new_list = catx('|', new_list, element_i);
end;
else do;
occurrences + 1;
h.replace(key: element_i, data: element_i, data: occurrences);
len = length(element_i);
digits = ceil(log10(occurrences + 1));
if (len + digits) > 32 then
new_element = cats(substr(element_i, 1, len - digits), occurrences);
else new_element = cats(element_i, occurrences);
new_list = catx('|', new_list, new_element);
end;
end;
call symput('listHeader', strip(new_list));
run;
%mend;
%macro CreateLengthStatement(listHeader, defLength);
%local lengthStatement header_h;
%let lengthStatement=;
%do h = 1 %to %ListLength(&listHeader.);
%let header_h = %ListElement(listHeader, &h);
%if &h. = 1 %then %let lengthStatement = &header_h. $ &defLength. ;
%else %let lengthStatement = &lengthStatement. &header_h. $ &defLength. ;
%end;
%let lengthStatement = &lengthStatement;
&lengthStatement
%mend;
1 该解决方案广泛使用宏。根据我的经验,人们建议我避免使用宏。恕我直言,我发现最好忽略该建议。 SAS 没有函数,这使得开发抽象变得困难。宏允许您模仿函数。对宏的常见恐惧是调试。坚持使用 Single Responsibility Principle,您会发现它们根本不难调试。用 %put
语句记录它们,您就会知道谁被调用以及何时被调用。如果您不熟悉宏,它们实际上只是文本替换。代码经过预处理器并用文本替换宏代码。然后执行该文本和其余代码。了解宏的最佳资源是 the manual。