如何将一个变量拆分成多行

时间:2015-06-14 10:49:51

标签: sas

**Application_id                Reaon_code            Value** 
123                              AB31AB45                £500
124                              AB43RD49TY87            £640
125                              RT87                    £900
126                              CD19RV29               £1000

我想得到的是将reason_code变量与其子集分开,每个原因只有4个字符,并且组合2个字母和2个数字,总是

我想获得的数据集如下:

Application_id             Reason_code                       Value 
123                             AB31                          £500 
123                             AB45                          £500
124                             AB43                          £640
124                             RD49                          £640
124                             TY87                          £640
145                             RT87                          £900

希望这是有道理的。

第二个问题,我想创建一个显示的标志:

Application_id             Reason_code               Value           Waterfall_reason                                                           Unique_Reason
123                             AB31                          £500                       1 (as it his AB31 first)                              0 (as it hits both AB31 and AB45)
123                             AB45                          £500                       0 (as it hits AB31 first)                             0 (as it hits both AB31 and AB45)
124                             AB43                         £640                        1 (as it hits AB43 first)                             0 (as it hits both AB43,RD49 and TY87)
124                             RD49                         £640                        0                                                            0
124                            TY87                           £640                        0                                                            0
145                            RT87                          £900                        1 (as it hits RT87 first)                              1 (as it ONLY Hit RT87) 

3 个答案:

答案 0 :(得分:3)

假设所有代码都是4个字符,那么一个简单的DO循环将完成这项工作。只需保持前四个字符,直到字符串为空。如果您创建一个只有长度为4的变量并为其指定一个更长的字符串,那么只需要前四个字符。然后,您可以使用SUBSTR()函数在下一次循环之前删除前四个字符。

data have ;
  input ID Reason_Code :$20. Value ;
cards;
123 AB31AB45 500
124 AB43RD49TY87 640
125 RT87 900
126 CD19RV29 1000
;;;;
data want ;
  set have (rename=(reason_code=reason_list));
  length Reason_code $4 Waterfall_reason 8 Unique_reason 8;
  unique_reason = length(reason_list)<= 4;
  waterfall_reason= 1;
  do until (reason_list=' ');
    reason_code = reason_list ;
    output;
    waterfall_reason=0;
    reason_list = substr(reason_list,5);
  end;
run;

答案 1 :(得分:0)

这是使用正则表达式的另一种方法,基于不同的假设,即您的子字符串基于字母+数字,而不是固定的4字符设置。下面的代码将拾取符合字母+数字模式的字符串(在这种情况下将包括2个字母+2个数字),一个接一个,直到输入字符串的整个长度耗尽。 'waterfall_reason'仅在拾取第一个子字符串后被标记,'unique_reason'由countw()使用字母作为分隔符完成。

data have;
    input ID Reason_Code :$20. Value;
    cards;
123 ABcd31AB45 500
124 AB43RD49T87 640
125 RT87 900
126 C19RV29 1000
;;;;

data want;
    set have;
    _pat=prxparse('/[a-z]+[0-9]+/io');
    _start=1;
    _stop=length(reason_code);
    unique_reason=ifn(countw(reason_code,,'a')=1,1,0);

    do _n=1 by 1 until (_pos = 0);
        call prxnext(_pat,_start,_stop,reason_code,_pos,_len);
        new_code=substr(reason_code,_pos, _len);
        waterfall_reason=ifn(_n=1,1,0);

        if not missing (new_code) then
            output;
    end;

    drop _:;
run;

答案 2 :(得分:-1)

return String.format("Hi %s, my name is %s", yourName, name);