无法在脚本中完全替换以查找和替换同义词

时间:2018-07-25 16:56:08

标签: php regex string str-replace synonym

在我开始之前,我要感谢所有帮助我解决问题的人-我上周花了整整时间来寻找错误,但我无法解决。

我写了一个脚本来查找输入字符串中的单词和短语,并用存储在数组中的数据库的同义词替换它们。

例如:

有些字符串需要替换短语。应该对每次出现的短语都进行替换,但是由于某种原因,当字符串中出现不止一个短语或单词,并且我想使用for循环和substr_replace函数替换它时,只会替换最后一个出现的单词。

每个重要部分旁边都有我的代码(带注释)(为了更好地理解,我写了有关问题的注释):

<?php

session_start();

function first_run($string)
{
    if(isset($_SESSION['iterrated_string']) && !empty(isset($_SESSION['iterrated_string'])))
    {
        return $_SESSION['iterrated_string'];       
    }
    else
    {
        return $string;     
    }
}


//ARRAY WITH THESAURUS DATABASE. KEY IS ORIGINAL WORD AND VALUES ARE REPLACEMENTS
$thesaurus_database = array(
                             'phrase' => array('word', 'sentence'),
                             'replace' => array('change', 'overwrite'),
                             'occurrence' => array('instance', 'existance'),
                             'made' => array('make', 'do')
                            );


//THE ORIGINAL INPUT STRING
$string = 
'There is some string with phrases to replace. Replace should be made to every occurrence of phrase but for some reason when string have more than one occurrence of phrase or word and i want to replace it using for loop, only one last occurrence is replaced';



foreach(array_keys($thesaurus_database) as $single_record) //READING EVERY SINGLE RECORD FROM THESAURUS DATABASE AND TRYING TO FIND MATCH IN INPUT STRING
{   
    $string_one = first_run($string); //FUNCTION FOR GIVE ORIGINAL STRING IN FIRST ITTERATION AND FOR SAVE IN SESSION MODIFIED STRING AND RETURN IT WHEN SESSION VARIABLE IS SET FOR NEXT ITERRATIONS

    if(preg_match_all("/\b$single_record\b/iu", $string_one, $matches, PREG_OFFSET_CAPTURE))//PREG MATCH ALL WITH RETURNED ARRAY OF FOUND MATCHES AND OFFSETS
    {   
        if(count($matches[0]) > 1) //WHEN MORE THAN ONE OCCURENCE IS FOUND
        {               
            for($i = 0; $i <= count($matches[0])-1; $i++)//FOR LOOP FOR READ MATCHES ARRAY WITH OFFSETS AND MAKE REPLACE USING SUBSTR_REPLACE FUNCTION
            {   
                $replace_multi_occureence = $thesaurus_database[$single_record][rand(0,count($thesaurus_database[$single_record])-1)].'.multi_replace_marker';//VARIABLE WITH REPLACEMENT

                $string_two = substr_replace    (
                    $string_one,
                    $replace_multi_occureence,
                    $matches[0][$i][1],
                    strlen($matches[0][$i][0])
                    );
                    $_SESSION['iterrated_string'] = $string_two; //OVERWRITING SESSION VARIABLE WITH MODIFIED STRING FOR NEXT ITERRATION.   
            }

        }
        else //WHEN ONLY ONE OCCURENCE IS FOUND
        {                               
            $replace_single_occurrence = $thesaurus_database[$single_record][rand(0,count($thesaurus_database[$single_record])-1)].'.single_replace_marker';//VARIABLE WITH REPLACEMENT                                 
            $string_two = substr_replace    (
                    $string_one,
                    $replace_single_occurrence,
                    $matches[0][0][1],
                    strlen($matches[0][0][0])
                    );
                    $_SESSION['iterrated_string'] = $string_two; //OVERWRITING SESSION VARIABLE WITH MODIFIED STRING FOR NEXT ITERRATION.                                           
        }

    }
}   

echo $string_two; //MODIFIED STRING

session_destroy();


?>  

请注意“ .multi_replace_marker ”和“ .single_replace_marker ”,以更好地识别替换位置!

我期望: “有一些字符串包含要 change.multi_replace_marker 的短语。 overwrite.multi_replace_marker 应为每个 instance.multi_replace_marker make.single_replace_marker / strong>短语,但由于某种原因,当字符串具有多个 sentence.single_replace_marker 或单词的 existence.multi_replace_marker 或单词,并且我想更改..multi_replace_marker < / strong>使用for循环,只有最后一次出现被替换”

脚本给了我什么: “有些字符串要替换成短语。对于每一个出现的短语,都应 do.single_replace_marker 进行替换,但是由于某些原因,当字符串出现多次 sentence.single_replace_marker 时,或单词,我想使用for循环 change.multi_replace_marker ,仅替换最后一个 existance.multi_replace_marker

问题: 单个出现的单词和短语会按我的意愿进行替换,但是当出现多个出现并且我尝试将其替换为for循环时,则仅对最后一次发声的元素进行替换。为什么我使用substr_replace而不是preg_replace?

因为我想使替换从最后一次结束的地方开始进行,而不是从每个烦恼中从字符串的开头开始,因为当我有大型同义词库数据库并使用preg_replace时,替换可能会覆盖自己。

在substr_replace中,我可以使用offset来指示每次迭代应从何处开始替换。在preg_replace我做不到。

0 个答案:

没有答案