使用SAS对字符变量进行模式匹配

时间:2017-01-12 18:30:10

标签: sas

我的数据集的字符变量为“PANCARD”,其观察结果为: - FGHIU9635F DFGHI6953G ZXCVB6325F XCVBN9658G DVHIGF963F LPMJI44444 现在我想使用SAS使用模式匹配从1万条记录中提取数据。所以我只会得到 FGHIU9635F DFGHI6953G ZXCVB6325F XCVBN9658G 条件是: - 1)前5个字符应为字母 2)接下来的4个字符应该是数字 3)最后一个是字母表。

2 个答案:

答案 0 :(得分:-1)

在SAS中使用perl正则表达式是我找到的最有效的方法。这是一个很好的教程

An Introduction to Perl Regular Expressions in SAS 9

详细说明......

     data match_values;
       input pancard $15.;

       /*------
         Parse pattern once while processing row 1 and not at every row
         ------*/
       retain p_1 p_2 p_3 p_all;
       if _n_=1 then 
       do;
         rule_pttrn_1 = "^[a-zA-Z]{5}";      /*first 5 characters should be alphabets*/
         rule_pttrn_2 = "^[a-zA-Z]{5}\d{4}"; /*first 5 alpha, next 4 charcters should be numeric */ 
         rule_pttrn_3 = ".*[a-zA-Z]$";       /* no matter what is at the begining, last one is alphabet*/

         /* parse all rules */
         p_1   = PRXPARSE("/" || rule_pttrn_1 || "/");
         p_2   = PRXPARSE("/" || rule_pttrn_2 || "/");
         p_3   = PRXPARSE("/" || rule_pttrn_3 || "/");
         p_all = prxparse("/^[a-zA-Z]{5}\d{4}.*[a-zA-Z]$/");
       end;

       /*-----
         test which patterns match 
         -----*/    
       match1   =prxmatch(p_1  ,strip(pancard));
       match2   =prxmatch(p_2  ,strip(pancard));
       match3   =prxmatch(p_3  ,strip(pancard));
       match_all=prxmatch(p_all,strip(pancard));

       /* keep only rows that match all rules */
       if match_all then output;
       keep pancard match:;

       cards;
       FGHIU9635F
       5DFGHI69530D
       $XCV66325F
       XCVBN96950R
       DVHITGF963
       LPMJI44444
       ;
    run;

答案 1 :(得分:-1)

data have;
   input x $15.;
   if prxmatch('/^[a-z]{5}[0-9]{4}\w[a-z]{1}$/i',strip(x))>0 then output;
   cards;
   FGHIU9635F 
   5DFGHI69530D 
   $XCV66325F
   XCVBN96950R
   DVHITGF963
   LPMJI44444
   ;
run;