我有2个相同的表,有不同的数据集,现在我想将单个字段中的单词与表b中同一列的多行进行比较,让我知道每个id的匹配百分比
示例:
以下是表A中的条目
Row1: 1, salt water masala
Row2: 2, water onion maggi milk
以下是表B中的条目
Row1: 1, salt masala water
Row2: 2, water onion maggi
期望的结果
Row1: Match 100% (All the 3 words are available but different order)
Row2: Match 75% as 1 word does not match out of the 4 words.
如果有人可以帮助我,那真的很棒。
答案 0 :(得分:0)
虽然在应用程序代码中实现这一点会更容易,但可以通过几个MySQL函数实现:
delimiter //
drop function if exists string_splitter //
create function string_splitter(
str text,
delim varchar(25),
pos tinyint) returns text
begin
return replace(substring_index(str, delim, pos), concat(substring_index(str, delim, pos - 1), delim), '');
end //
drop function if exists percentage_of_matches //
create function percentage_of_matches(
str1 text,
str2 text)returns double
begin
set str1 = trim(str1);
set str2 = trim(str2);
while instr(str1, ' ') do
set str1 = replace(str1, ' ', ' ');
end while;
while instr(str2, ' ') do
set str2 = replace(str2, ' ', ' ');
end while;
set @i = 1;
set @numWords = 1 + length(str1) - length(replace(str1, ' ', ''));
set @numMatches = 0;
while @i <= @numWords do
set @word = string_splitter(str1, ' ', @i);
if str2 = @word or str2 like concat(@word, ' %') or str2 like concat('% ', @word) or str2 like concat('% ', @word, ' %') then
set @numMatches = @numMatches + 1;
end if;
set @i = @i + 1;
end while;
return (@numMatches / @numWords) * 100;
end //
delimiter ;
第二个函数用在第二个函数中,这是你要在代码中调用的函数,如下所示:
select percentage_of_matches('salt water masala', 'salt masala water');
select percentage_of_matches('water onion maggi milk', 'water onion maggi');