SAS - 删除字符串中的重复单词

时间:2017-03-06 03:06:01

标签: sas

string =“spanner,span,spaniel,span”; 从这个字符串我想删除所有重复项保留一个单词,然后使用SAS输出修改后的字符串。 修改后的字符串应如下所示: var string =“spanner,span,spaniel”;

4 个答案:

答案 0 :(得分:1)

data a;
    string = "spanner,span,spaniel,span,abc,span,bcc";
    length word $100;
    i = 2;
    do while(scan(string, i, ',') ^= '');
        word = scan(string, i, ',');
        do j = 1 to i - 1;
            if word = scan(string, j, ',') then do;
                start = findw(string, word, ',', findw(string, word, ',', 't') + 1, 't');
                string = cats(substr(string, 1, start - 2), substr(string, start + length(word)));
                leave;
            end;
        end;
        i = i + 1;
    end;
    keep string;
run;

答案 1 :(得分:1)

首先创建一个包含单词的列的数据集。使用 cats()可以消除空间。

data temp(keep=text);
  string = "spanner, span, spaniel, span";
  do i=1 to count(cats(string),",")+1;
    text = scan(string,i);
    output;
  end;
run;

使用nodup消除重复(nodupkey也可以)。

proc sort data=temp nodup;    
  by text;
run;

使用唯一的单词创建一个宏变量 new_string

proc sql noprint;
  SELECT text
  INTO :new_string separated by ","
  FROM temp
  ;
quit;

更好地解决新规范:

data temp(keep=i text);
  string = tranwrd("I hate the product. I hate it because it smells bad. I hate wasting money.","."," .");      do i=1 to count(string," ")+1;
    text = scan(string,i," ");
    if text ne "" then do;
      output;
    end;
  end;
run;

proc sort data=temp;    
  by text i;
run;

data temp2;
  set temp;   
  by text i;
  if first.text OR text eq ".";
run;

proc sort data=temp2;    
  by i;
run;

proc sql noprint;
  SELECT text
  INTO :new_string separated by ","
  FROM temp
  ;
quit;

答案 2 :(得分:0)

谢谢罗伯特。只是想让您知道我在您的代码中发现了一个缺陷。内部循环通过删除重复的单词来修改字符串,但是外部循环无论如何都将检查原始字符串的下一个位置。示例:“ A,B,C,B,B”变为“ A,B,C,B”,因为内部循环删除了第四个B,然后外部循环找不到了最后一个“ B”,因为它移到了第四个“ B”的位置。

我的解决方案:

data a;
    string = "spanner,span,spaniel,span,abc,span,bcc";
    length word $100;
    i = 2;
    do while(scan(string, i, ',') ^= '');
        hit = 0;
        word = scan(string, i, ',');
        do j = 1 to i - 1;
            if word = scan(string, j, ',') then do;
                start = findw(string, word, ',', findw(string, word, ',', 't') + 1, 't');
                string = cats(substr(string, 1, start - 2), substr(string, start + length(word)));
                hit = 1;
                leave;
            end;
        end;
        if hit = 0 then i = i + 1;
    end;
    keep string;
run;

答案 3 :(得分:0)

将唯一词列表构建到新变量中。

data test;
  input string $80.;
  length newstring $80;
  do i=1 to countw(string,',');
    if not findw(newstring,scan(string,i,','),',','t') then
      newstring=catx(', ',newstring,scan(string,i,','))
    ;
  end;
cards;
spanner, span, spaniel, span
;