我正在努力评估SAS Perl正则表达式,以确定什么被替换为什么。我已经通过SAS文档了解每个元字符代表什么。但是,有人可以帮我轻松确定下面的内容被替换的内容吗?
PRXPARSE('s/(^[0-9]+\s)|([#][0-9]+)|(\s[A-Z][0-9]+)|([`''\*\+\-\,\!"#])|([\.](?!BETA))|(DEALER[0-9]+)|(\s[0-9]{3,})|([0-9]+\s*$)//');
上述表达式是SAS宏中使用的数据步骤的一部分。
data &INPUT_DATA_SET (DROP=MATCH1 MATCH2 DEALER1 DEALER2);
set &INPUT_DATA_SET;
LENGTH DEALER1 $ 40. DEALER $ 40.;
if _N_ = 1 then MATCH1 = PRXPARSE('s/(^[0-9]+\s)|([#][0-9]+)|(\s[A-Z][0-9]+)|([`''\*\+\-\,\!"#])|([\.](?!COM))|(STORE[0-9]+)|(\s[0-9]{3,})|([0-9]+\s*$)//');
if _N_ = 1 then MATCH2 = PRXPARSE("s/\s+/ /");
RETAIN MATCH1;
RETAIN MATCH2;
call PRXCHANGE(MATCH1, -1, &DEALER_NAME_FIELD, DEALER1);
call PRXCHANGE(MATCH2, -1, DEALER1, DEALER);
run;
我请求某人就第一个PRXPARSE表达式中字符串替换的发生方式提供解释。
提前致谢。 Naga Vemprala
答案 0 :(得分:3)
我只是为了解释目的重组了正则表达式。不要用代码替换下面的代码,因为对于多行正则表达式连接,需要在传递给PRXPARSE函数之前完成尾随和前导空白的删除。
PRXPARSE('s/(^[0-9]+\s)| /* Number(with 1 or more digit) that starts the line gets selected by the regex */
([#][0-9]+)| /* or Any number(with 1 or more digit) that starts with a pound (#) sign gets selected with the # by the regex */
(\s[A-Z][0-9]+)| /* or Any number(with 1 or more digit) that starts with a space follwed by an Alphabet(single alphabet) gets selected with the space & alphabet by the regex */
([`''\*\+\-\,\!"#])| /* or occurrence of any of the following signs `'*+-,!"# would get selected */
([\.](?!COM))| /* or any . sign would get selected which does not follow COM string after that */
(STORE[0-9]+)| /* or STORE string followed by a number(with 1 or more digit) gets selected */
(\s[0-9]{3,})| /* or any number(minimum 3 digits and no max limit) preceded by a white space character, including space, tab, line break */
([0-9]+\s*$) /* or number(with 1 or more digit) followed by a white space character, including space, tab, line break followed by 0 or more spaces at the end of a line */
//'); /* Records matching any of the above group(selected from top to bottom) gets removed from the input variable */
正则表达式需要像 / match / replacement /'一样编写,以便它适用于PRXCHANGE或CALL PRXCHANGE函数。此外,由于您使用CALL PRXCHANGE并将-1作为第二个参数,因此将从最终变量中删除变量中发生的任何数量的正则表达式匹配(由于在regex的最后部分没有使用替换)。 p>
我建议使用在线正则表达式测试工具来运行/验证/构建您在SAS中运行它们之前编写的正则表达式。例如http://www.regexr.com/v1/等。
希望这有帮助