Replacing Mutiple Words in String in SAS

时间:2016-05-03 20:09:33

标签: arrays sas

I am trying to search a string for multiple words, then if any of those words are found, remove them. I wrote the below code which seems to work on some words but not all, and when it does work, it only works on the last word in the string.

data readyinput;
set readyforstreetname(obs=200);

array cw (48) $11 (' ave ',' avenue ',' blvd ',' boulevard ',' cir ',' circle ',' court ',' ct ',' drive ',' dr ',' e ',' east ',' highway ',' hwy ',' lane ',' ln ',' north ',' n ',' nw ',' northwest ',' parkway ',' pkwy ',' pl ',' place ',' pl ',' plaza ',' rd ',' road ',' route ',' route ',' rte ',' rte ',' rt ',' rt ',' s ',' south ',' se ',' southeast ',' st ',' street ',' suite ',' ste ',' sw ',' southwest ',' w ',' west ',' apartment ',' apt ');

do i=1 to dim(cw);

if indexw(lowcase(address_input),cw[i])
then 
do;
    add = upcase(tranwrd(lowcase(address_input),cw[i],''));
end;    
end;


drop    cw:;
run;

Basically what I'm trying to do is strip an address of all common words then parse out the street number and street name, which would be done in a later step.

2 个答案:

答案 0 :(得分:1)

您的问题是,每次尝试删除单词时,您都会使用原始字符串而不是前面单词所修改的字符串。

add=lowcase(address_input);
do i=1 to dim(cw);
  if indexw(add,cw[i]) then 
    add = tranwrd(add,cw[i],'')
  ;
end;    
add = upcase(add);

您可能还需要更改查找和转换单词的方式。我发现使用INDEXW()指定非空字分隔符会更好。

data test ;
  array cw (2) $10 _temporary_ ('N','ST');
  input address $80. ;
  new=address;
  new = cats('|',translate(upcase(left(compbl(new))),'|',' '),'|');
  do i=1 to dim(cw) ;
    if indexw(new,cats('|',cw(i),'|'),'|') then
      new=tranwrd(new,cats('|',cw(i),'|'),'|')
    ;
  end;
  new = translate(new,' ','|');
  put address= / new= ;
cards;
N Main St
;;;;

答案 1 :(得分:0)

不确定数组是否更简单。您可以循环浏览单词列表以删除并用空格替换它们。您可能还想删除结果地址变量,以便最后删除任何双/三个空格。

%let words_to_ignore = "word1" "word2" "word3" ... "wordN";

%macro remove_words;

    data your_data2;
        set your_data;
        %do i = 1 %to %sysfunc(countw(&words_to_ignore.));
        %let this_word = %scan(&words_to_ignore., &i.);
            address = tranwrd(address, "&this_word.", "");
        %end;
        address = compbl(address);
    run;    


%mend remove_words;

%remove_words;