sas查找两个字符串之间有多少个相同的词

时间:2018-07-11 21:11:59

标签: sas

我有两个字符串。模糊匹配没有完全帮助。因此,要在单词完全相同的地方添加一个附加因素。

 DATA COMPONENT;
 infile datalines delimiter=','; 
 length FIRST $ 1000 FIRST_B $ 1000;
 INPUT FIRST $ FIRST_B $;
 DATALINES;
Electric Component keyboard replacement, Keyboard inward component replacement
Electric Component keyboard replacement, Monitor Component Replacement
Electric Component keyboard replacement, Mouse component
Electric Component keyboard replacement, Wire Replacement
Electric Component keyboard replacement, PIN part
;

 DATA Compged;
 SET COMPONENT;
 CALL COMPCOST('SWAP=', 5, 'P=', 0, 'INS=', 10,'DEL=',10,'APPEND=',5);
 First_COMPGED=COMPGED(FIRST, FIRST_B, 'iln');
 RUN;

 PROC SORT DATA= Compged;
  BY  First_COMPGED;
 RUN;

列first和first_b匹配,并且最匹配的是显示器部件的更换和电子部件键盘的更换,但希望键盘向内部件的更换。因此,我试图找到这些字符串之间的通用词作为附加因素。我使用此代码将字符串拆分为单词。

data word_split;
set COMPGED;
delims = ' ,.-!'; 
array FIRST_B_WORDS[6] $15 FIRST_B1-FIRST_B6;
array FIRST_WORDS[6] $15 FIRST1-FIRST6;
do i = 1 to 6;
FIRST_B_WORDS[i] = scan(FIRST_B,i,",- ");
FIRST_WORDS[i] = scan(FIRST,i,",- ");
count_words_B=countw(FIRST_B, delims);
count_words=countw(FIRST, delims);
end;

drop i delims;
run; 

enter image description here 有没有一种方法可以将FIRST_B1-FIRST_B6与FIRST1-FIRST6进行比较,以查看有多少个常用词,以便我可以将其添加为一个因素。

0 个答案:

没有答案