重新排列数组中的单词以匹配计算Levenshtein距离Php的位置

时间:2019-02-01 10:13:39

标签: php arrays sorting fuzzywuzzy

根据第一个数组的位置重新排列Array中的单词。在我的代码中,有两个数组,我的第一个数组是基础数组,我将从中将其与第二个数组进行比较,并使位置与第一个数组相同。

以2个输入为基础,以1个输入为基础,我正在应用levenshtein(元音(每个单词数据库),元音(每个银行单词)),然后基于此将bankdata的单词排列在新数组中

databaseName = LAL BAHADUR SHASTRI bankdata = SHASTRI LAL源代码将仅重新排列bankdata并将其存储在bankdata : LAL SHASTRI

的新数组中

重新排列正确,只需要排列数组中的单词

        $db = 'LAL BAHADUR SHASTRI YADAV';
        $bank = 'SHASTRI LAL';
        $a = reArrangeArray($db,$bank);

        function reArrangeArray($db,$bank)
        {
            $dataBaseName = $db;
            $bankdataRows = [$db,$bank,];
            $dbWords = preg_split("#[\s]+#", $dataBaseName);     
            foreach ($bankdataRows as $bankdata)
            {
            $bankWords = preg_split("#[\s]+#", trim($bankdata));
            $result    = [];    
            if(!empty($bankWords))
                foreach ($dbWords as $dbWord)
                {
                $idx   = null;
                $least = PHP_INT_MAX;
                foreach ($bankWords as $k => $bankWord)
                    if (($lv = levenshtein(metaphone($bankWord),metaphone($dbWord))) < $least)
                    {
                    $least = $lv;
                    $idx   = $k;
                    }
                @$result[] = $bankWords[$idx];
                unset($bankWords[$idx]);
                }
            $result = array_merge($result, $bankWords);
            var_dump($result);
            }
        }

案例1:当前输出

        array (size=4)
        0 => string 'LAL' (length=3)
        1 => string 'BAHADUR' (length=7)
        2 => string 'SHASTRI' (length=7)
        3 => string 'YADAV' (length=5)

        array (size=4)
        0 => string 'LAL' (length=3)
        1 => string 'SHASTRI' (length=7)
        2 => null
        3 => null

预期产量

我需要与databaseArray相同的数组位置

        $dbName = 'LAL BAHADUR SHASTRI YADAV';
        $bankName = 'SHASTRI LAL';

        array of db (size=4)
        0 => string 'LAL' (length=3)
        1 => string 'BAHADUR' (length=7)
        2 => string 'SHASTRI' (length=7)
        3 => string 'YADAV' (length=5)

        array of bankname (size=4)
        0 => string 'LAL' (length=3)
        1 => #
        2 => string 'SHASTRI' (length=7)
        3 => ###

如果在第一个数组中未找到单词,则应将其放在#处,因为位置为3,而没有匹配元素,则其为3#

        array (size=4)
        0 => string 'LAL' (length=3)
        1 => string 'BAHADUR' (length=7)
        2 => string 'SHASTRI' (length=7)
        3 => string 'YADAV' (length=5)

        array (size=4)
        0 => string 'LAL' (length=3)
        1 => string 'SHASTRI' (length=7)
        2 => null
        3 => null

预期产量

我需要与databaseArray相同的数组位置

        $dbName = 'LAL BAHADUR SHASTRI YADAV';
        $bankName = 'SHARI LAL';

        array of db (size=4)
        0 => string 'LAL' (length=3)
        1 => string 'BAHADUR' (length=7)
        2 => string 'SHASTRI' (length=7)
        3 => string 'YADAV' (length=5)

        array of bankname (size=4)
        0 => string 'LAL' (length=3)
        1 => #
        2 => string 'SHARI' (length=7)
        3 => ###

这种情况将根据levenshtein(metaphone($bankWord),metaphone($dbWord))

计算

案例2

输入:

$dbName = NikithaRani MohanRao $bankdata = Nikitha Rani Mohan Rao

Output : $newbankdata = NikithaRani MohanRao

如果在$ dbName中串联,则该单词应串联

注意

计算单词的位置,只需比较第一个数组即可在数组中移动单词

Expected Output

Problem Diagram

1 个答案:

答案 0 :(得分:1)

我不确定我是否理解了整个问题,但让我们尝试仅解决重新排列数组的问题:

$a1 = explode(" ", "LAL BAHADUR SHASTRI YADAV");
// sort $a1 to whatever order you need
$a2 = explode(" ", "SHASTRI LAL");

foreach($a1 as $key => $e) { // for each element set him or fill with "#"
    $res[$key] = in_array($e, $a2) ? $e : str_repeat("#", $key); 
}

str-repeat与char重复了x次。这段代码在O(n*m)中运行-如果需要,可以将其修改为O(n)(当n是第一个数组中的元素数时)。

我希望能有所帮助,甚至可以随意发表评论

已编辑:

用于查找Levenshtein最小距离的第一个定义函数:

function foundLevenshteinMinIndex($word, $arr) {
    $word = metaphone($word);
    foreach ($arr as $k =>$e)
        $a[] = levenshtein($word,metaphone($e));
    return array_search(min($a), $a);
}

现在使用与$a1, $a2相同的方式:

foreach($a2 as $w) {
    $i = foundLevenshteinMinIndex($w, $a1);
    if (!isset($res[$i]) || (levenshtein(metaphone($a1[$i]), metaphone($res[$i])) > levenshtein(metaphone($a1[$i]), metaphone($w))))
        $res[$i] = $w;
}

for($i = 0; $i < count($a1); $i++) 
    if (!isset($res[$i])) // if not set in the index fill with "#'
        $res[$i] = str_repeat("#", $i);
// rearrange by int indexs
ksort($res);

已编辑2

看看这个实现:

$a1 = explode(" ", 'LAL BAHADUR SHASTRI YADAV');
$a2 = explode(" ",'SHASTRI LAL NABA');

function getDist($a1, $a2) {
    foreach($a2 as $k1 => $w1)
        foreach($a1 as $k2 => $w2)
            $arr[$k1][$k2] = levenshtein(metaphone($w1), metaphone($w2));
    return $arr;
}

function getMin($arr) {
    $min = PHP_INT_MAX;
    $minX = $minY = null;
    foreach($arr as $x => $row)
        foreach($row as $y => $cell)
            if ($cell < $min) {
                $min = $cell;
                $minX = $x;
                $minY = $y;
            }
    return array($minX, $minY);
}

function removeIndex($arr, $x, $y) {
    unset($arr[$x]);
    foreach($arr as &$row)
        unset($row[$y]);
    return $arr;
}

$arr = getDist($a1, $a2);
while (count($arr) && count(reset($arr))) {
    list($x, $y) = getMin($arr);
    if (!isset($res[$y]))
        $res[$y] = $a2[$x];
    $arr = removeIndex($arr, $x, $y);
}

for($i = 0; $i < count($a1); $i++)
    if (!isset($res[$i])) // if not set in the index fill with "#'
        $res[$i] = str_repeat("#", $i);
ksort($res);

请注意,当O(n*(m^2))是第一个数组而n是第二个数组时,此代码的时间复杂度为m