SAS使用换行符从.CSV导入

时间:2014-04-28 09:30:33

标签: sas

我有.csv和换行符,想要导入到SAS,但我面临的问题是数据有CUSTOMER和空格(换行文本)。请帮助我如何克服这个问题,类似的方式我有一些其他变量,如果我导入mannualy其工作正常。请找到下面的例子。请参阅SLN PJ0136以了解问题。

SLN     MOD PM  NE      CUSTOMER
32121   GG  1   1   AVAILABLE UPON REQUEST
71403   EN  1   0   JET SUPPORT SERVICE INC.
305173  EN  1   1   UNKNOWN / COTTONWOOD, LLC / J SUPPORT SERVICE, INC.
PJ0136  PS  0   0   "UNKNOWN / GROUP B-50 INC AA
                    TC0004   anada CSC Europe
                    Inglewood Ava" 
EB0162  RG  0   0   ATR

我用infile导入

DATA WORK.test1;
%let _EFIERR_ = 0; 
INFILE 'C:\Users\26631.IELPWC\Downloads\test.csv'
       delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;

    INFORMAT
        SLN  $CHAR6. MOD $CHAR2. PM  BEST1.  NE BEST1. CUSTOMER $CHAR82. ;
    FORMAT
        SLN  $CHAR6.  MOD  $CHAR2. PM  BEST1.  NE   BEST1. CUSTOMER  $CHAR82. ;
    INPUT
        SLN $  MOD $  PM NE CUSTOMER $ ;

   if _ERROR_ then call symputx('_EFIERR_',1);
RUN;

请查看输出错误

32121   GG  1   1   AVAILABLE UPON REQUEST
71403   EN  1   0   JET SUPPORT SERVICE INC.
305173  EN  1   1   UNKNOWN / COTTONWOOD, LLC / J SUPPORT SERVICE, INC.
PJ0136  PS  0   0   "UNKNOWN / GROUP B-50 INC AA
TC0004      .   .   
24719       .   .   
"       .   .   
EB0162  RG  0   0   ATR

1 个答案:

答案 0 :(得分:1)

假设您的输入数据采用以下格式:

SLN,MOD,PM,NE,CUSTOMER
32121,GG,1,1,AVAILABLE UPON REQUEST
71403,EN,1,0,JET SUPPORT SERVICE INC.
305173,EN,1,1,"UNKNOWN / COTTONWOOD, LLC / J SUPPORT SERVICE, INC."
PJ0136,PS,0,0,"UNKNOWN / GROUP B-50 INC AA
TC0004   anada CSC Europe
Inglewood Ava"
EB0162,RG,0,0,ATR

以下SAS代码将生成所需的输出:

data TEST (drop=_TMP_:);
  format SLN $6. MOD $2. PM 8. NE 8. CUSTOMER $82. _TMP_STR $100.;
  infile 'input.csv' truncover firstobs=2 dlm=',' dsd lrecl=10000;
  input SLN MOD PM NE _TMP_STR @;
  _TMP_COUNT=0;
  do until(mod(_TMP_COUNT, 2) = 0);
    CUSTOMER=catx('0A'x, CUSTOMER, _TMP_STR);
    _TMP_COUNT=_TMP_COUNT + countc(_TMP_STR, '"');
    if mod(_TMP_COUNT, 2) then do;
      input _TMP_STR;
    end;
  end;
  CUSTOMER=dequote(CUSTOMER);
run;

请注意 CUSTOMER 列的值,其中SLN='PJ0136'是多行(Unix样式)。您可以通过更改函数catx(...)来删除它。