Question

我有一个代码，用于将输出与数组的值进行比较，并且仅使用数组中的单词终止操作：

第一个代码（只是一个例子）

$myVar = 'essa pizza é muito gostosa, que prato de bom sabor';
$myWords=array(
    array('sabor','gosto','delicia'),
    array('saborosa','gostosa','deliciosa'),
);

foreach($myWords as $words){
    shuffle($words); // randomize the subarray
    // pipe-together the words and return just one match
    if(preg_match('/\K\b(?:'.implode('|',$words).')\b/',$myVar,$out)){
        // generate "replace_pair" from matched word and a random remaining subarray word
        // replace and preserve the new sentence
        $myVar=strtr($myVar,[$out[0]=>current(array_diff($words,$out))]);
    }
}
echo $myVar;

我的问题：

我有第二个代码，不适用于rand / shuffle（我不想要rand，我想要替换中的精度，我总是将列0更改为1），是要始终交换值：

// wrong output: $myVar = "minha irmã alanné é not aquela blnode, elere é a bom plperito";
$myVar = "my sister alannis is not that blonde, here is a good place";
$myWords=array(array("is","é"),
    array("on","no"),
    array("that","aquela"),
    //array("blonde","loira"),
    //array("not","não"),
    array("sister","irmã"), 
    array("my","minha"),
    //array("nothing","nada"),
    array("myth","mito"),
    array("he","ele"),
    array("good","bom"),
    array("ace","perito"),
   // array("here","aqui"), //if [here] it does not exist, it is not to do replacement from the line he=ele = "elere" non-existent word  
); 
$replacements = array_combine(array_column($myWords,0),array_column($myWords,1));
$myVar = strtr($myVar,$replacements);
echo $myVar;
// expected output:  minha irmã alannis é not aquela blonde, here é a bom place
//  avoid replace words slice!

预期输出： minhairmãalannisé不是aquela金发女郎，这里是一个很棒的地方

    //  avoid replace words slice! always check if the word exists in the array before making the substitution.

alanné， blnode ， elere ， plperito

它检查输出是否是真实的单词，它存在于数组myWords中，这可以避免输入错误，如：

那4个单词不是一个存在的单词，一个写错误。你如何为第二个代码做到这一点？

简而言之，交换必须由完整的单词/键，现有单词组成。而不是使用关键字切片创建奇怪的东西！

Answer 1

不幸的是strtr()是这项工作的错误工具，因为它是“字边界无知”。要定位整个单词，使用带有单词边界的正则表达式模式没有更简单的方法。

此外，为了确保较长的字符串在较短的字符串（可能存在于其他字符串中的字符串）之前匹配，您必须按字符串长度排序$myWords（降序/最长到最短;使用多字节版本只在必要的时候。）

对单词数组进行排序并转换为单独的正则表达式模式后，您可以将数组输入pattern的{{1}}和replace参数。

代码（Demo）

preg_replace()

输出：

$myVar = "my sister alannis is not that blonde, here is a good place";
$myWords=array(
    array("is","é"),
    array("on","no"),
    array("that","aquela"),
    array("sister","irmã"), 
    array("my","minha"),
    array("myth","mito"),
    array("he","ele"),
    array("good","bom"),
    array("ace","perito")
); 
usort($myWords,function($a,$b){return mb_strlen($b[0])<=>mb_strlen($a[0]);});  // sort subarrays by first column multibyte length
// remove mb_ if first column holds no multi-byte characters.  strlen() is much faster.

foreach($myWords as &$words){
    $words[0]='/\b'.$words[0].'\b/i';  // generate patterns using search word, word boundaries, and case-insensitivity
}

//var_export($myWords);
//var_export(array_column($myWords,0));
//var_export(array_column($myWords,1));

$myVar=preg_replace(array_column($myWords,0),array_column($myWords,1),$myVar);
echo $myVar;

这不做的是欣赏匹配的子串的情况。我的意思是，minha irmã alannis é not aquela blonde, here é a bom place和my都会被My替换。

要容纳不同的套管，您需要使用minha。

这是考虑因素（处理大写的首字母单词，而不是所有大写单词）：

代码（Demo）＆lt; - 运行此代码以查看更换后保留的原始外壳。

preg_replace_callback()

Answer 2

我以前的方法非常低效。我没有意识到你正在处理多少数据，但如果我们超过4000行，那么效率是至关重要的（我认为我的大脑根据你之前的问题考虑了strtr()相关处理）。这是我的新/改进的解决方案，我希望将以前的解决方案留在尘埃中。

代码：（Demo）

$myVar="My sister alannis Is not That blonde, here is a good place. I know Ariane is not MY SISTER!";
echo "$myVar\n";

$myWords=array(
    array("is","é"),
    array("on","no"),
    array("that","aquela"),
    array("sister","irmã"), 
    array("my","minha"),
    array("myth","mito"),
    array("he","ele"),
    array("good","bom"),
    array("ace","perito"),
    array("i","eu")  // notice I must be lowercase
);
$translations=array_combine(array_column($myWords,0),array_column($myWords,1));  // or skip this step and just declare $myWords as key-value pairs

// length sorting is not necessary
// preg_quote() and \Q\E are not used because dealing with words only (no danger of misinterpretation by regex)

$pattern='/\b(?>'.implode('|',array_keys($translations)).')\b/i';  // atomic group is slightly faster (no backtracking)
/* echo $pattern;
   makes: /\b(?>is|on|that|sister|my|myth|he|good|ace)\b/i
   demo: https://regex101.com/r/DXTtDf/1
*/
$translated=preg_replace_callback(
    $pattern,
    function($m)use($translations){  // bring $translations (lookup) array to function
        $encoding='UTF-8';  // default setting
        $key=mb_strtolower($m[0],$encoding);  // standardize keys' case for lookup accessibility
        if(ctype_lower($m[0])){ // treat as all lower
            return $translations[$m[0]];
        }elseif(mb_strlen($m[0],$encoding)>1 && ctype_upper($m[0])){  // treat as all uppercase
            return mb_strtoupper($translations[$key],$encoding);
        }else{  // treat as only first character uppercase
            return mb_strtoupper(mb_substr($translations[$key],0,1,$encoding),$encoding)  // uppercase first
                  .mb_substr($translations[$key],1,mb_strlen($translations[$key],$encoding)-1,$encoding);  // append remaining lowercase
        }
    },
    $myVar);

echo $translated;

输出：

My sister alannis Is not That blonde, here is a good place. I know Ariane is not MY SISTER!
Minha irmã alannis É not Aquela blonde, here é a bom place. Eu know Ariane é not MINHA IRMÃ!

此方法：

仅{strong> 1 传递$myVar，而$myWords的每个子数组都不会传递1次。
不打算对查找数组进行排序（$myWords / $translations）。
不打扰正则表达式转义（preg_quote()）或使模式组件文字化（\Q..\E），因为只翻译了单词。
使用单词边界，以便只替换完整的单词匹配。
使用原子组作为微优化，在保持准确的同时拒绝回溯。
为稳定性/可维护性/可重用性声明$encoding值。
与不区分大小写匹配，但替换为区分大小写...如果英语匹配为：
1. 全部小写，替换
2. 全部大写（且大于单个字符），替换
3. 大写（仅限多字符串的第一个字符），替换

通过数组的值交换变量的值，但是在条件

2 个答案: