如何在胶合时分离(带有空格)字符串,使用数组中的键来检查它是否粘合?

时间:2017-12-14 23:53:56

标签: php arrays replace whitespace

如何在胶合时分离(带有空格)字符串,使用数组中的键来检查它是否粘合?

胶合:sisteralannisgoodplace(替换为:sister alannisgood place

注意:两者都有数组中现有键的起始部分:姐妹,好,但它们不是正确的键,所以不能发生替换,所以我需要将它们分开,所以可以在脚本的下一步中进行替换。另一个解决方案是删除与$ myWords

中的键不完全相同的所有内容

此代码用于替换字符串,我想要改进,验证字符串是否被粘合的代码,并在它们之间添加空格,将它们分开:

$myVar = "my sisteralannis is not that blonde, here is a goodplace";
$myWords=array(
    array("is","é"),
    array("on","no"),
    array("that","aquela"),
    array("sister","irmã"), 
    array("my","minha"),
    array("myth","mito"),
    array("he","ele"),
    array("good","bom"),
    array("ace","perito")
); 
usort($myWords,function($a,$b){return mb_strlen($b[0])<=>mb_strlen($a[0]);});  // sort subarrays by first column multibyte length
// remove mb_ if first column holds no multi-byte characters.  strlen() is much faster.

foreach($myWords as &$words){
    $words[0]='/\b'.$words[0].'\b/ui';  // generate patterns using search word, word boundaries, and case-insensitivity
}

$myVar=preg_replace(array_column($myWords,0),array_column($myWords,1),$myVar);
 //APPLY SECOND SOLUTION HERE

echo $myVar;

预期输出:minha irmã alannis é not aquela blonde, here é a bom place

=================

2ª解决方案更简单: 在$ myVar和$ myWords之间进行匹配,并删除$ myWords中不存在的任何内容。

将删除数组中找不到的变量的所有字符串

输出 minha é aquela, é

1 个答案:

答案 0 :(得分:1)

我不会说我100%确信这将处理所有可能的情况,但它确实适用于您的输入字符串,我确实构建它以容纳首字母大写的单词。除此之外,可能还有一些边缘情况需要进行一些调整。

有一些内联解释可以帮助理解代码。

代码:(Demo

$myVar = "My sisteralannis is not that blonde, here is a goodplace";
$myWords=[["is","é"],["on","no"],["that","aquela"],["sister","irmã"],["my","minha"],
          ["myth","mito"],["he","ele"],["good","bom"],["ace","perito"]];
usort($myWords,function($a,$b){return strlen($b[0])<=>strlen($a[0]);});  // longer English words before shorter
$search=array_column($myWords,0);  // cache for multiple future uses

//input: "My sisteralannis is not that blonde, here is a goodplace";
//filter: ++ ------------- ++ --- ++++ ------  ---- ++ - ---------
//output: Minha            é      aquela     ,      é

$disqualifying_pattern='/ ?\b(?>'.implode('|',$search).')\b(*SKIP)(*FAIL)| ?[a-z]+/i';  // this handles the spaces for the sample input, might not work for all cases
//echo $disqualifying_pattern,"\n";
$filtered=preg_replace($disqualifying_pattern,'',$myVar);
//echo $filtered,"\n";

$patterns=array_map(function($v){return '/\b'.$v.'\b/i';},$search);
$replace=array_column($myWords,1);
echo preg_replace_callback(
        $patterns,
        function($m)use($patterns,$replace){
            $new=preg_replace($patterns,$replace,$m[0],1); // tell it to stop after replacing once
            if(ctype_upper($m[0][0])){  // if first letter of English word is uppercase
                $mb_ucfirst=mb_strtoupper(mb_substr($new,0,1));  // target and make upper, first letter of Portugese word
                return $mb_ucfirst.mb_substr($new, 1); // apply new uppercase letter to the rest of the Portugese word
            }
            return $new;
        },
        $filtered
    );

输出:

Minha é aquela, é