两个模式分组与PRXPARSE正则表达式?

时间:2014-10-31 11:45:51

标签: regex sas

继我今天早些时候提出的一个问题之后,我现在让我的状态扫描正则表达式完全符合我的要求。我现在想把状态代码引入我的正则表达式。我的状态名称我想要不区分大小写,我的州代码区分大小写。因此,我在我的正则表达式中放置了两个具有不同大小写设置的模式分组。

当在regex中单独使用时,两个组都按预期工作,但是当我尝试使用两个分组时,只有第二个用于状态名称才能找到匹配项。代码如下:

options noquotelenmax;

data countries;
do i = 1 to 20;
output;
end;
run;

data countries;
length state $50.;
set countries;

if i = 1 then state = 'CALIFORNIA';
if i = 2 then state = 'alabama';
if i = 3 then state = 'NewYork';
if i = 4 then state = 'OHIO';
if i = 5 then state = 'ohio';
if i = 6 then state = 'FLORIDA';
if i = 7 then state = 'georgia';
if i = 8 then state = 'TEXAS';
if i = 9 then state = 'Kansas';
if i = 10 then state = 'MAINE';

if i = 11 then state = 'AL';
if i = 12 then state = 'AK';
if i = 13 then state = 'CO';
if i = 14 then state = 'MT';
if i = 15 then state = 'OH';
if i = 16 then state = 'SD';
if i = 17 then state = 'PA';
if i = 18 then state = 'IA';
if i = 19 then state = 'PW';
if i = 20 then state = 'AP';

run;

data countries;
set countries;
prx_1 = (prxparse(
"/^(?:AL|AK|AZ|AR|
CA|CO|CT|DE|
DC|FL|GA|HI|
ID|IL|IN|IA|
KS|KY|LA|ME|
MD|MA|MI|MN|
MS|MO|MT|NE|
NV|NH|NJ|NM|
NY|NC|ND|OH|
OK|OR|PA|RI|
SC|SD|TN|TX|
UT|VT|VA|WA|
WV|WI|WY|AS|
GU|MP|PR|VI|
UM|FM|MH|PW|
AA|AE|AP|CM|
CZ|NB|PI|TT|)(?i:Alabama|Alaska|Arizona|Arkansas|
California|Colorado|Connecticut|Delaware|
District\s*of\s*Columbia|Florida|Georgia|Hawaii|
Idaho|Illinois|Indiana|Iowa|Kansas|
Kentucky|Louisiana|Maine|Maryland|
Massachusetts|Michigan|Minnesota|Mississippi|
Missouri|Montana|Nebraska|Nevada|
New\s*Hampshire|New\s*Jersey|New\s*Mexico|
New\s*York|North\s*Carolina|North\s*Dakota|
Ohio|Oklahoma|Oregon|Pennslyvania|
Rhode\s*Island|South\s*Carolina|South\s*Dakota
Tennessee|Texas|Utah|Vermont|Virginia|
Washington|West\s*Virginia|Wisconsin|Wyoming|
American\s*Samoa|Guam|Northern\s*Mariana\s*Islands|
Puerto\s*Rico|Virgin\s*Islands|
U\s*S\s*\s*Minor\s*Outlying\s*Islands|
Federated\s*States\s*of\s*Micronesia|Marshall\s*Islands|
Palau)$/"));
prx_valid_addr_1 = (prxmatch(prx_1, strip(state))) ;
run;

options quotelenmax;

谁能看到我做错了什么?

由于

1 个答案:

答案 0 :(得分:3)

你有额外的" |"之后" TT"没有" |"两组之间。以下应该有效:

data countries;
set countries;
prx_1 = (prxparse(
"/^(?:AL|AK|AZ|AR|
CA|CO|CT|DE|
DC|FL|GA|HI|
ID|IL|IN|IA|
KS|KY|LA|ME|
MD|MA|MI|MN|
MS|MO|MT|NE|
NV|NH|NJ|NM|
NY|NC|ND|OH|
OK|OR|PA|RI|
SC|SD|TN|TX|
UT|VT|VA|WA|
WV|WI|WY|AS|
GU|MP|PR|VI|
UM|FM|MH|PW|
AA|AE|AP|CM|
CZ|NB|PI|TT)|(?i:Alabama|Alaska|Arizona|Arkansas|
California|Colorado|Connecticut|Delaware|
District\s*of\s*Columbia|Florida|Georgia|Hawaii|
Idaho|Illinois|Indiana|Iowa|Kansas|
Kentucky|Louisiana|Maine|Maryland|
Massachusetts|Michigan|Minnesota|Mississippi|
Missouri|Montana|Nebraska|Nevada|
New\s*Hampshire|New\s*Jersey|New\s*Mexico|
New\s*York|North\s*Carolina|North\s*Dakota|
Ohio|Oklahoma|Oregon|Pennslyvania|
Rhode\s*Island|South\s*Carolina|South\s*Dakota
Tennessee|Texas|Utah|Vermont|Virginia|
Washington|West\s*Virginia|Wisconsin|Wyoming|
American\s*Samoa|Guam|Northern\s*Mariana\s*Islands|
Puerto\s*Rico|Virgin\s*Islands|
U\s*S\s*\s*Minor\s*Outlying\s*Islands|
Federated\s*States\s*of\s*Micronesia|Marshall\s*Islands|
Palau)$/"));
prx_valid_addr_1 = (prxmatch(prx_1, strip(state))) ;
run;