sas PRXPARSE和PRXCHANGE声明评估

时间:2015-03-13 04:28:28

标签: regex perl sas

我正在努力评估SAS Perl正则表达式,以确定什么被替换为什么。我已经通过SAS文档了解每个元字符代表什么。但是,有人可以帮我轻松确定下面的内容被替换的内容吗?

PRXPARSE('s/(^[0-9]+\s)|([#][0-9]+)|(\s[A-Z][0-9]+)|([`''\*\+\-\,\!"#])|([\.](?!BETA))|(DEALER[0-9]+)|(\s[0-9]{3,})|([0-9]+\s*$)//');

上述表达式是SAS宏中使用的数据步骤的一部分。

data &INPUT_DATA_SET (DROP=MATCH1 MATCH2 DEALER1 DEALER2);
        set &INPUT_DATA_SET;
        LENGTH DEALER1 $ 40. DEALER $ 40.;
        if _N_ = 1 then MATCH1 = PRXPARSE('s/(^[0-9]+\s)|([#][0-9]+)|(\s[A-Z][0-9]+)|([`''\*\+\-\,\!"#])|([\.](?!COM))|(STORE[0-9]+)|(\s[0-9]{3,})|([0-9]+\s*$)//');
        if _N_ = 1 then MATCH2 = PRXPARSE("s/\s+/ /");
        RETAIN MATCH1;
        RETAIN MATCH2;
        call PRXCHANGE(MATCH1, -1, &DEALER_NAME_FIELD, DEALER1);
        call PRXCHANGE(MATCH2, -1, DEALER1, DEALER);
run;

我请求某人就第一个PRXPARSE表达式中字符串替换的发生方式提供解释。

提前致谢。 Naga Vemprala

1 个答案:

答案 0 :(得分:3)

我只是为了解释目的重组了正则表达式。不要用代码替换下面的代码,因为对于多行正则表达式连接,需要在传递给PRXPARSE函数之前完成尾随和前导空白的删除。

PRXPARSE('s/(^[0-9]+\s)|       /*    Number(with 1 or more digit) that starts the line gets selected by the regex */
        ([#][0-9]+)|           /* or Any number(with 1 or more digit) that starts with a pound (#) sign gets selected with the # by the regex */
        (\s[A-Z][0-9]+)|       /* or Any number(with 1 or more digit) that starts with a space follwed by an Alphabet(single alphabet) gets selected with the space & alphabet by the regex */
        ([`''\*\+\-\,\!"#])|   /* or occurrence of any of the following signs `'*+-,!"# would get selected */
        ([\.](?!COM))|         /* or any . sign would get selected which does not follow COM string after that */
        (STORE[0-9]+)|         /* or STORE string followed by a number(with 1 or more digit) gets selected */
        (\s[0-9]{3,})|         /* or any number(minimum 3 digits and no max limit) preceded by a white space character, including space, tab, line break */
        ([0-9]+\s*$)           /* or number(with 1 or more digit) followed by a white space character, including space, tab, line break followed by 0 or more spaces at the end of a line */

       //');                   /* Records matching any of the above group(selected from top to bottom) gets removed from the input variable */

正则表达式需要像 / match / replacement /'一样编写,以便它适用于PRXCHANGE或CALL PRXCHANGE函数。此外,由于您使用CALL PRXCHANGE并将-1作为第二个参数,因此将从最终变量中删除变量中发生的任何数量的正则表达式匹配(由于在regex的最后部分没有使用替换)。 p>

我建议使用在线正则表达式测试工具来运行/验证/构建您在SAS中运行它们之前编写的正则表达式。例如http://www.regexr.com/v1/等。

希望这有帮助