我的数据集的字符变量为“PANCARD”,其观察结果为: - FGHIU9635F DFGHI6953G ZXCVB6325F XCVBN9658G DVHIGF963F LPMJI44444 现在我想使用SAS使用模式匹配从1万条记录中提取数据。所以我只会得到 FGHIU9635F DFGHI6953G ZXCVB6325F XCVBN9658G 条件是: - 1)前5个字符应为字母 2)接下来的4个字符应该是数字 3)最后一个是字母表。
答案 0 :(得分:-1)
在SAS中使用perl正则表达式是我找到的最有效的方法。这是一个很好的教程
An Introduction to Perl Regular Expressions in SAS 9
详细说明......
data match_values;
input pancard $15.;
/*------
Parse pattern once while processing row 1 and not at every row
------*/
retain p_1 p_2 p_3 p_all;
if _n_=1 then
do;
rule_pttrn_1 = "^[a-zA-Z]{5}"; /*first 5 characters should be alphabets*/
rule_pttrn_2 = "^[a-zA-Z]{5}\d{4}"; /*first 5 alpha, next 4 charcters should be numeric */
rule_pttrn_3 = ".*[a-zA-Z]$"; /* no matter what is at the begining, last one is alphabet*/
/* parse all rules */
p_1 = PRXPARSE("/" || rule_pttrn_1 || "/");
p_2 = PRXPARSE("/" || rule_pttrn_2 || "/");
p_3 = PRXPARSE("/" || rule_pttrn_3 || "/");
p_all = prxparse("/^[a-zA-Z]{5}\d{4}.*[a-zA-Z]$/");
end;
/*-----
test which patterns match
-----*/
match1 =prxmatch(p_1 ,strip(pancard));
match2 =prxmatch(p_2 ,strip(pancard));
match3 =prxmatch(p_3 ,strip(pancard));
match_all=prxmatch(p_all,strip(pancard));
/* keep only rows that match all rules */
if match_all then output;
keep pancard match:;
cards;
FGHIU9635F
5DFGHI69530D
$XCV66325F
XCVBN96950R
DVHITGF963
LPMJI44444
;
run;
答案 1 :(得分:-1)
data have;
input x $15.;
if prxmatch('/^[a-z]{5}[0-9]{4}\w[a-z]{1}$/i',strip(x))>0 then output;
cards;
FGHIU9635F
5DFGHI69530D
$XCV66325F
XCVBN96950R
DVHITGF963
LPMJI44444
;
run;