如何检测多个文本的变化部分

时间:2014-08-18 02:49:31

标签: php

我在数组中有多个字符串,我应该找到哪些部分(=字)被更改(更改的文本和它所在的位置)。我应该使用什么算法?

输入

$strings = [
    'a' => 'Blah blah. Value of something is 123456 and it is about 50%.',
    'b' => 'Blah blah. Value of something is 10203 and it is about 75%.',
    'c' => 'Blah blah. Value of something is 9999 and it is about 500%.',
    // more rows like this
];

输出

$output = 'Blah blah. Value of something is [a=123456|b=10203|c=9999] and it is about [a=50%|b=75%|c=500%].';

(是的,我将在某些时候添加一些花哨的Html鼠标悬停..)


目前我正在使用PHP-FineDiff进行一些实验,但如果我想要将两个以上的字符串相互比较,那就太麻烦了。 (我应该写一些巨大的循环来检查一次的字符或尝试正则表达式或..?)

2 个答案:

答案 0 :(得分:0)

如果我是你,我会将字符串分解为单词,并比较数组:

$strings = [
    'a' => 'Blah blah. Value of something is 123456 and it is about 50%.',
    'b' => 'Blah blah. Value of something is 10203 and it is about 75%.',
    'c' => 'Blah blah. Value of something is 9999 and it is about 500%.',
    // more rows like this
];
$strings_exploded = [];
foreach ($strings as $i=>$string) {
    $strings_exploded[$i] = explode(' ', $string);
}

if (count($strings_exploded[0]) {
    $strings_numbered = array_values($strings); // much easier to iterate numbered arrays
    foreach ($strings_exploded[0] as $i=>$string) {
        for ($j = 1; $j < count($strings); $j++) {
            $compare_exploded = explode(' ', $strings_numbered[$j]);
            if ($strings_exploded[0][$i] == $strings_numbered[$j][$i])
                // same word
            else
                // different word, add it to a difference string maybe?
        }
    }
}

答案 1 :(得分:0)

这可以让你更接近找到答案;你可以将每个句子拆分成单词数组,然后集体运行array_diff()。最后,将这些单词组合在一起并创建前面找到匹配项的数组:

$strings = [
    'a' => 'Blah blah. Value of something is 123456 and it is about 50%.',
    'b' => 'Blah blah. Value of something is 10203 and it is about 75%.',
    'c' => 'Blah blah. Value of something is 9999 and it is about 500%.',
];

// turn sentences into arrays of "words" (adjust where necessary)
$tmp = array_map(function($arr) {
    return explode(' ', $arr);
}, $strings);

// find collective differences
$diff = call_user_func_array('array_diff', $tmp);

// build final result
$result = [];
foreach ($tmp as $id => $words) {
    foreach ($words as $index => $word) {
        if (isset($diff[$index])) {
            $result[$index][$id] = $word;
        } else {
            $result[$index] = $word;
        }
    }
}

print_r($result);

输出

Array
(
    [0] => Blah
    [1] => blah.
    [2] => Value
    [3] => of
    [4] => something
    [5] => is
    [6] => Array
        (
            [a] => 123456
            [b] => 10203
            [c] => 9999
        )

    [7] => and
    [8] => it
    [9] => is
    [10] => about
    [11] => Array
        (
            [a] => 50%.
            [b] => 75%.
            [c] => 500%.
        )

)