任务: 我有两列产品名称。 我需要从B列中找到Cell A1中最相似的单元格,然后是A2,A3等等。
输入:
Col A | Col B
-------------
Red | Blackwell
Black | Purple
White | Whitewater
Green | Reddit
输出:
红色= Reddit / 66%相似
Black = Blackwell / 71%相似
怀特=白水/ 66%相似
绿色= Reddit / 30%相似
我认为Levenstein Distance可以帮助排序,但我不知道如何应用它。
提前致谢,任何信息都有帮助。
答案 0 :(得分:3)
<?php
// Arrays of words
$colA = ['Red', 'Black', 'White', 'Green'];
$colB = ['Blackwell', 'Purple', 'Whitewater', 'Reddit'];
// loop through words to find the closest
foreach ($colA as $a) {
// Current max number of matches
$maxMatches = -1;
$bestMatch = '';
foreach ($colB as $b) {
// Calculate the number of matches
$matches = similar_text($a, $b, $percent);
if ($matches > $maxMatches) {
// Found a better match, update
$maxMatches = $matches;
$bestMatch = $b;
$matchPercentage = $percent;
}
}
echo "$a = $bestMatch / " .
number_format($matchPercentage, 2) .
"% similar\n";
}
第一个循环遍历第一个数组的元素,每个元素初始化找到的最佳匹配数和匹配时匹配字符数。
内部循环遍历可能匹配的数组,寻找最佳匹配,对于每个候选者,它检查相似性(您可以在此使用levenshtein
而不是similar_text
,但后者很方便,因为它如果当前单词比变量更新的当前最佳匹配更好匹配,则计算你的百分比。
对于外循环中的每个单词,我们回显找到的最佳匹配和百分比。根据需要格式化。
答案 1 :(得分:0)
我不确定你在哪里得到这些所需的百分比,所以我只会使用php函数生成的值,你可以决定是否要对它们进行任何计算。
levenshtein()
根本无法提供您在问题中提出的所需匹配项。我认为使用similar_text()
会更明智。
代码:(Demo)
$arrayA=['Red','Black','White','Green'];
$arrayB=['Blackwell','Purple','Whitewater','Reddit'];
// similar text
foreach($arrayA as $a){
$temp=array_combine($arrayB,array_map(function($v)use($a){similar_text($v,$a,$percent); return $percent;},$arrayB)); // generate assoc array of assessments
arsort($temp); // sort descending
$result[]="$a is most similar to ".key($temp)." (sim-score:".number_format(current($temp))."%)"; // access first key and value
}
var_export($result);
echo "\n--\n";
// levenstein doesn't offer the desired matching
foreach($arrayA as $a){
$temp=array_combine($arrayB,array_map(function($v)use($a){return levenshtein($v,$a);},$arrayB)); // generate assoc array of assessments
arsort($temp); // sort descending
$result2[]="$a is most similar to ".key($temp)." (lev-score:".current($temp).")"; // access first key and value
}
var_export($result2);
输出:
array (
0 => 'Red is most similar to Reddit (sim-score:67%)',
1 => 'Black is most similar to Blackwell (sim-score:71%)',
2 => 'White is most similar to Whitewater (sim-score:67%)',
3 => 'Green is most similar to Purple (sim-score:36%)',
)
--
array (
0 => 'Red is most similar to Whitewater (lev-score:9)',
1 => 'Black is most similar to Whitewater (lev-score:9)',
2 => 'White is most similar to Blackwell (lev-score:8)',
3 => 'Green is most similar to Blackwell (lev-score:8)',
)