将HTML格式的字符串插入另一个字符串

时间:2014-10-29 14:19:14

标签: php html string

我有两个字符串。其中一个包含<em>标记,完全是小写,并且不包含分隔符或常见字词,如&#39;,&#39; in&#39;等等,而另一个不是& #39;吨。一个例子:

$str1 = 'world <em>round</em>';
$str2 = 'World - is Round';

我想通过比较$str2中包含'World - is <em>Round</em>'标记的小写字词,将$str1设为<em>。到目前为止,我已经完成了以下操作,但如果两个字符串中的单词数量相等,则会失败。

public static function applyHighlighingOnDisplayName($str1, $str2) {
    $str1_w = explode(' ', $str1);
    $str2_w = explode(' ', $str2);
    for ($i=0; $i<count($str1_w); $i++) {
       if (strpos($str1_w[$i], '<em>') !== false) {
            $str2_w[$i] = '<em>' . $str2_w[$i] . '</em>';
       }
    }
    return implode(' ', $str2_w);
}

$str1 = '<em>cup</em> <em>cakes</em>' & $str2 = 'Cup Cakes':

applyHighlighingOnDisplayName($str1, $str2) : '<em>Cup</em> <em>Cakes</em>': Correct

$str1 = 'cup <em>cakes</em>' & $str2 = 'The Cup Cakes':

applyHighlighingOnDisplayName($str1, $str2) : 'The <em>Cup</em> Cakes: Incorrect

我应该如何改变方法?

3 个答案:

答案 0 :(得分:1)

您当前的方法取决于字符串中的字数;更好的解决方案是使用正则表达式为您进行匹配。即使您强调的是其他强调词的子串(例如“猫”和“猫的摇篮”或“猫砂”),以下版本也能安全地工作。

function applyHighlighingOnDisplayName($str1, $str2) {

    # if we have strings surrounded by <em> tags...
    if (preg_match_all("#<em>(.+?)</em>#", $str1, $match)) {

        ## sort the match strings by length, descending
        usort($match[1], function($a,$b){ return strlen($b) - strlen($a); } );

        # all the match words are in $match[1]
        foreach ($match[1] as $m) {
            # replace every match with a string that is very unlikely to occur
            # this prevents \b matching the start or end of <em> and </em>
            $str2 = preg_replace("#\b($m)\b#i",
                "ZZZZ$1ZZZZ",
                $str2);
        }
        # replace ZZZZ with the <em> tags
        return preg_replace("#ZZZZ(.*?)ZZZZ#", "<em>$1</em>", $str2);
    }
    return $str2;
}

$str1 = 'cup <em>cakes</em>';
$str2 = 'Cup Cakes';

print applyHighlighingOnDisplayName($str1, $str2) . PHP_EOL;

输出:

Cup <em>Cakes</em>
The Cup <em>Cakes</em>

两个没有<em>'字词的字符串:

$str1 = 'cup cakes';
$str2 = 'Cup Cakes';

print applyHighlighingOnDisplayName($str1, $str2) . PHP_EOL;

输出:

Cup Cakes

现在的事情比较棘手:很多简短的单词,其中一个单词是所有其他单词的子串:

  

$str1 = '<em>i</em> <em>if</em> <em>in</em> <em>i\'ve</em> <em>is</em> <em>it</em>';

     

$str2 = 'I want to make the str2 as "World - is Round", by comparing which lowercase word in the str1 contains the em tag. So far, I\'ve done the following, but it fails if number of words aren\'t equal in both strings.';

输出:

  

<em>I</em> want to make the str2 as "World - <em>is</em> Round", by comparing which lowercase word <em>in</em> the str1 contains the em tag. So far, <em>I've</em> done the following, but <em>it</em> fails <em>if</em> number of words aren't equal <em>in</em> both strings.

答案 1 :(得分:1)

像其他人所说,正则表达式是解决方案。这是一个带有详细注释的工作示例:

$string1 = 'world <em>round</em>';
$string2 = 'World is - Round';

// extract what's in between <em> and </em> - it will be stored in $matches[1]
preg_match('/<em>(.+)<\/em>/i', $string1, $matches);

if (!$matches) {
    echo 'The first string does not contain <em>';
    exit();
}

// replace what we found in the previous operation
$newString = preg_replace('/\b' . preg_quote($matches[1], '\b/') . '/i', '<em>$0</em>', $string2);
echo $newString;

详细信息:

稍后编辑 - 涵盖多个案例:

$string1 = 'world <em>round</em> not <em>flat</em>';
$string2 = 'World is - Round not Flat! Round, ok?';

// extract what's in between <em> and </em> - it will be stored in $matches[1]
preg_match_all('/<em>(.+?)<\/em>/i', $string1, $matches);

if (!$matches) {
    echo 'The first string does not contain <em>';
    exit();
}

foreach ($matches[1] as $match) {
    // replace what we found in the previous operation
    $string2 = preg_replace('/\b' . preg_quote($match) . '\b/i', '<em>$0</em>', $string2);
}

echo $string2;

答案 2 :(得分:0)

这是因为您的突出显示代码期望两个字符串中的字位置之间存在1:1的对应关系:

cup <em>cakes</em>
 1        2
Cup     Cakes

但是你的错误样本:

cup <em>cakes</em>
 1        2            3
The      Cup         Cakes

e.g。你在单词#2找到<em>,所以你在另一个字符串中突出显示单词#2 - 但在该字符串中,单词#2是Cup

更好的算法是从原始字符串中删除html,因此最终只能使用cup cakes。然后在另一个字符串中查找cup cakes,并突出显示该位置的第二个单词。这将补偿由额外(或更少)单词引起的字符串内的任何“动作”。