我有一个代码,用于将输出与数组的值进行比较,并且仅使用数组中的单词终止操作:
第一个代码(只是一个例子)
$myVar = 'essa pizza é muito gostosa, que prato de bom sabor';
$myWords=array(
array('sabor','gosto','delicia'),
array('saborosa','gostosa','deliciosa'),
);
foreach($myWords as $words){
shuffle($words); // randomize the subarray
// pipe-together the words and return just one match
if(preg_match('/\K\b(?:'.implode('|',$words).')\b/',$myVar,$out)){
// generate "replace_pair" from matched word and a random remaining subarray word
// replace and preserve the new sentence
$myVar=strtr($myVar,[$out[0]=>current(array_diff($words,$out))]);
}
}
echo $myVar;
我的问题:
我有第二个代码,不适用于rand / shuffle(我不想要rand,我想要替换中的精度,我总是将列0更改为1),是要始终交换值:
// wrong output: $myVar = "minha irmã alanné é not aquela blnode, elere é a bom plperito";
$myVar = "my sister alannis is not that blonde, here is a good place";
$myWords=array(array("is","é"),
array("on","no"),
array("that","aquela"),
//array("blonde","loira"),
//array("not","não"),
array("sister","irmã"),
array("my","minha"),
//array("nothing","nada"),
array("myth","mito"),
array("he","ele"),
array("good","bom"),
array("ace","perito"),
// array("here","aqui"), //if [here] it does not exist, it is not to do replacement from the line he=ele = "elere" non-existent word
);
$replacements = array_combine(array_column($myWords,0),array_column($myWords,1));
$myVar = strtr($myVar,$replacements);
echo $myVar;
// expected output: minha irmã alannis é not aquela blonde, here é a bom place
// avoid replace words slice!
预期输出: minhairmãalannisé不是aquela金发女郎,这里是一个很棒的地方
// avoid replace words slice! always check if the word exists in the array before making the substitution.
alanné, blnode , elere , plperito
它检查输出是否是真实的单词,它存在于数组myWords中,这可以避免输入错误,如:
那4个单词不是一个存在的单词,一个写错误。你如何为第二个代码做到这一点?
简而言之,交换必须由完整的单词/键,现有单词组成。而不是使用关键字切片创建奇怪的东西!
答案 0 :(得分:1)
不幸的是strtr()
是这项工作的错误工具,因为它是“字边界无知”。要定位整个单词,使用带有单词边界的正则表达式模式没有更简单的方法。
此外,为了确保较长的字符串在较短的字符串(可能存在于其他字符串中的字符串)之前匹配,您必须按字符串长度排序$myWords
(降序/最长到最短;使用多字节版本只在必要的时候。)
对单词数组进行排序并转换为单独的正则表达式模式后,您可以将数组输入pattern
的{{1}}和replace
参数。
代码(Demo)
preg_replace()
输出:
$myVar = "my sister alannis is not that blonde, here is a good place";
$myWords=array(
array("is","é"),
array("on","no"),
array("that","aquela"),
array("sister","irmã"),
array("my","minha"),
array("myth","mito"),
array("he","ele"),
array("good","bom"),
array("ace","perito")
);
usort($myWords,function($a,$b){return mb_strlen($b[0])<=>mb_strlen($a[0]);}); // sort subarrays by first column multibyte length
// remove mb_ if first column holds no multi-byte characters. strlen() is much faster.
foreach($myWords as &$words){
$words[0]='/\b'.$words[0].'\b/i'; // generate patterns using search word, word boundaries, and case-insensitivity
}
//var_export($myWords);
//var_export(array_column($myWords,0));
//var_export(array_column($myWords,1));
$myVar=preg_replace(array_column($myWords,0),array_column($myWords,1),$myVar);
echo $myVar;
这不做的是欣赏匹配的子串的情况。我的意思是,minha irmã alannis é not aquela blonde, here é a bom place
和my
都会被My
替换。
要容纳不同的套管,您需要使用minha
。
这是考虑因素(处理大写的首字母单词,而不是所有大写单词):
代码(Demo)&lt; - 运行此代码以查看更换后保留的原始外壳。
preg_replace_callback()
答案 1 :(得分:1)
我以前的方法非常低效。我没有意识到你正在处理多少数据,但如果我们超过4000行,那么效率是至关重要的(我认为我的大脑根据你之前的问题考虑了strtr()
相关处理)。这是我的新/改进的解决方案,我希望将以前的解决方案留在尘埃中。
代码:(Demo)
$myVar="My sister alannis Is not That blonde, here is a good place. I know Ariane is not MY SISTER!";
echo "$myVar\n";
$myWords=array(
array("is","é"),
array("on","no"),
array("that","aquela"),
array("sister","irmã"),
array("my","minha"),
array("myth","mito"),
array("he","ele"),
array("good","bom"),
array("ace","perito"),
array("i","eu") // notice I must be lowercase
);
$translations=array_combine(array_column($myWords,0),array_column($myWords,1)); // or skip this step and just declare $myWords as key-value pairs
// length sorting is not necessary
// preg_quote() and \Q\E are not used because dealing with words only (no danger of misinterpretation by regex)
$pattern='/\b(?>'.implode('|',array_keys($translations)).')\b/i'; // atomic group is slightly faster (no backtracking)
/* echo $pattern;
makes: /\b(?>is|on|that|sister|my|myth|he|good|ace)\b/i
demo: https://regex101.com/r/DXTtDf/1
*/
$translated=preg_replace_callback(
$pattern,
function($m)use($translations){ // bring $translations (lookup) array to function
$encoding='UTF-8'; // default setting
$key=mb_strtolower($m[0],$encoding); // standardize keys' case for lookup accessibility
if(ctype_lower($m[0])){ // treat as all lower
return $translations[$m[0]];
}elseif(mb_strlen($m[0],$encoding)>1 && ctype_upper($m[0])){ // treat as all uppercase
return mb_strtoupper($translations[$key],$encoding);
}else{ // treat as only first character uppercase
return mb_strtoupper(mb_substr($translations[$key],0,1,$encoding),$encoding) // uppercase first
.mb_substr($translations[$key],1,mb_strlen($translations[$key],$encoding)-1,$encoding); // append remaining lowercase
}
},
$myVar);
echo $translated;
输出:
My sister alannis Is not That blonde, here is a good place. I know Ariane is not MY SISTER!
Minha irmã alannis É not Aquela blonde, here é a bom place. Eu know Ariane é not MINHA IRMÃ!
此方法:
$myVar
,而$myWords
的每个子数组都不会传递1次。$myWords
/ $translations
)。preg_quote()
)或使模式组件文字化(\Q..\E
),因为只翻译了单词。$encoding
值。