类似的文字,不会丢失文字格式

时间:2017-10-08 04:29:42

标签: php regex string similarity preg-replace-callback

如何使用函数 similar_text()来删除文本格式?我想为类似的文本创建一个函数,而不会在不区分大小写的正则表达式中丢失文本格式。我自己还没能做任何事情。因为我不理解这类操作类似于文本和类似文本的情况。我只是尝试使用 explode()的功能通过空格分隔单词。我的选择在任何注册中都不起作用。另外,请勿在文本中返回原始格式和原始单词寄存器。我该怎么办?你喜欢在这个任务上做什么?如何解决这种情况?当然,该功能也应该尽可能快地工作,尽可能多的数据(大数组)。作为正确的单词,最初会有一个数组具有正确的单词变体(在小写中)。使用数组中的这些单词,您需要尽可能正确地更正文本,并在文本中的单词的源寄存器中进行更正。

实施例

$dict = array(
    "two",
    "occasions",
    "have",
    "been",
    "asked",
    "members",
    "parliament",
    "pray",
    "babbage",
    "you",
    "into",
    "machine",
    "wrong",
    "figures",
    "will",
    "right",
    "answers",
    "come",
    "able",
    "rightly",
    "apprehend",
    "confusion",
    "ideas",
    "question"
);

输入文字:

On tw ocasons I hve bee aked [by mebers of Pariamnt]: 'Pry, Mr. Babage, if you put ito the mahine wrng figres, wll the rigt aswers cme out?' I am not ale rghty to aprehend the kind of conusion of idas that could provoke such a quetion. Charles Babbage

需要的结果:

On two occasions I have been asked [by members of Parliament]: 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. Charles Babbage

1 个答案:

答案 0 :(得分:0)

function similar_correcting($text, $dict)
{
    $step1 = strtolower($text); // text to lower case
    $step2 = preg_replace('/[^a-z ]/iu', ' ', $step1); // delete all without letters a-z
    $input = array_unique(explode(' ', $step2)); // generate new unique array from input text

    $res = array(); // create new associative array for incorrect/correct values for using to replace
    foreach ($input as $in) 
    {
        $match = 0; // set default value to variable match
        foreach ($dict as $correct) // us out dict for checking word in input text array values
        {
            similar_text($correct, $in, $percent); // check for correcting
            if ($percent > $match) // if persent > 0
            {
                $result = $correct; // set to result correct form word
                $match = $percent; 
            }
        }
        $res[$in] = $result; // set to output array res keys founded incorrect values from input text array and set to value correct form words for replace
    }
    // Here we replace all words in input text from associative array $res by replace key to value in input text in any cases
    $response = preg_replace_callback("/\pL+/u", function ($m) use ($res) {
    $word = mb_strtolower($m[0]);
    if (isset($res[$word])) {
        $repl = $res[$word];
        if ($word === $m[0]) return $repl;
        if (mb_strtoupper($word) === $m[0]) return mb_strtoupper($repl);
        if (mb_convert_case($word,  MB_CASE_TITLE) === $m[0]) return mb_convert_case($repl,  MB_CASE_TITLE);
        for ($i = 0, $len = mb_strlen($word); $i < $len; ++$i) {
            $mixed[] = mb_substr($word, $i, 1) === mb_substr($m[0], $i, 1) 
                ? mb_substr($repl, $i, 1)
                : mb_strtoupper(mb_substr($repl, $i, 1));
        }
        return implode("", $mixed);
    }
    return $m[0];
    }, $text);

    return $response; // and return the closest correct text form.
}

$text = "On tw ocasons I hve bee aked [by mebers of Pariamnt]: 'Pry, Mr. Babage, if you put ito the mahine wrng figres, wll the rigt aswers cme out?' I am not ale rghty to aprehend the kind of conusion of idas that could provoke such a quetion. Charles Babbage";

echo similar_correcting($text, $dict);

输入文字:

有两个原因我曾经[帕里亚姆的会员]说道:'乖乖,巴巴奇先生,如果你把它放在mahine wrng figres上,那么这个问题会不会出来?我并不是因为那种可能引起这种疑惑的idas的混乱。查尔斯巴贝奇

输出文字:

错误的两次INTO被问到​​[你们两位议员]:'祈祷,各位议员。巴贝奇,你问到机器错误的数字,会有正确的答案来吗? INTO已经能够正确地将两个混淆成两个混乱的两个想法让你理解机器有问题。有巴贝奇

如何弄错结果。但这是因为字典中不存在某些单词。

让我们在词典中添加必要的单词

$dict = array
(
    "on",
    "two",
    "occasions",
    "i",
    "have",
    "been",
    "asked",
    "by",
    "members",
    "of",
    "parliament",
    "pray",
    "mr",
    "babbage",
    "if",
    "you",
    "put",
    "into",
    "the",
    "machine",
    "wrong",
    "figures",
    "will",
    "right",
    "answers",
    "come",
    "out",
    "am",
    "not",
    "able",
    "rightly",
    "to",
    "apprehend",
    "kind",
    "confusion",
    "ideas",
    "that",
    "could",
    "provoke",
    "such",
    "a",
    "question",
    "charles",
);

$text = "On tw ocasons I hve bee aked [by mebers of Pariamnt]: 'Pry, Mr. Babage, if you put ito the mahine wrng figres, wll the rigt aswers cme out?' I am not ale rghty to aprehend the kind of conusion of idas that could provoke such a quetion. Charles Babbage";

echo similar_correcting($text, $dict);

输入文字:

有两个原因我曾经[帕里亚姆的会员]说道:'乖乖,巴巴奇先生,如果你把它放在mahine wrng figres上,那么这个问题会不会出来?我并不是因为那种可能引起这种疑惑的idas的混乱。查尔斯巴贝奇

输出文字:

有两次我被国会议员问过:'请问,巴贝奇先生,如果你把错误的数字输入机器,那么正确的答案会出来吗?'我无法正确地理解可能引发这样一个问题的那种想法混乱。查尔斯巴贝奇

用户需要的文字:

有两次我被国会议员问过:'请问,巴贝奇先生,如果你把错误的数字输入机器,那么正确的答案会出来吗?'我无法正确地理解可能引发这样一个问题的那种想法混乱。查尔斯巴贝奇