我有一些错误的在线html编辑器创建的html文件。用户正在选择任何文本并按下斜体按钮,然后文本将被插入<em></em>
标签中。
通过使用此功能-有时,用户将一些文本设为斜体,然后将其删除,然后他又变回斜体。
在许多情况下,我收到带有重复标签的错误HTML代码,如下所示:
示例1:
Adding insult to injury, <em><em>Jennifer <a href="somelink">Aniston</a></em> had literally <a href="somelink2">zero clue</a> what was coming.</em>
示例2:
Adding insult to injury, <em><em>Jennifer Aniston</em> had literally <a href="somelink2">zero clue</a> what was coming.</em>
问题是如何删除重复的标签-另一个<em>
标签内的<em>
-标签不是必需的,应将其删除。
我写了一个代码,但是它不能很好地工作-很好的解决方案是使用reg exp-我尝试了一些正则表达式,但是没有用,所以我改用另一种方式:
function repairDoubleTags($line = '', $rtag = 'em') {
if(empty($line)) return false;
if(!preg_match("#<".$rtag.">#", $line))
return $line;
$tmp = explode(" ", $line);
//print_r($tmp);
$lastposition = -1;
$remove_next = 0;
foreach($tmp as $nr => $word) {
//echo $word."\r\n";
if(empty($word)) {
unset($tmp[$nr]);
continue;
}
if(preg_match("#<".$rtag.">#", $word)) {
if($lastposition == -1) {
$lastposition = $nr;
//echo "----------------- ".$rtag." FOUND\r\n";
}else {
$tmp[$nr] = trim(preg_replace("#<".$rtag.">#", "", $tmp[$nr]));
$remove_next = 1;
$lastposition = -1;
//echo "----------------- DOUBLE ".$rtag." FOUND AND REMOVED\r\n";
}
}
if(preg_match("#</".$rtag.">#", $word)) {
if($remove_next == 1) {
$tmp[$nr] = trim(preg_replace("#</".$rtag.">#", "", $tmp[$nr]));
$remove_next = 0;
//echo "----------------- DOUBLE END ".$rtag." FOUND AND REMOVED\r\n";
}else {
$lastposition = -1;
}
}
if(empty($tmp[$nr]))
unset($tmp[$nr]);
}
//print_r($tmp);
$line = join(' ', $tmp);
//print_r($line);
//exit;
return $line;
}
但是,如果html代码包含多个<em>
,则此代码不起作用-例如,在以下情况下不起作用:
Adding insult to injury, <em><em>Jennifer Aniston</em> had literally <a href="somelink2">zero clue</a> what <em>was coming</em>.</em>
有任何regex
专家寻求快速不错的解决方案吗?
谢谢!
答案 0 :(得分:-1)
猜测我们可能在此处遇到的其他无效<em>
有点复杂,但是,如果您想探索正则表达式选项,我们可能可以从类似于以下内容的表达式开始:
(?=<em><em>)(<em>)(.*?)(<\/em>)
并替换为$2
。这仅是示例,该表达式无疑容易失败。
如果我们可能还有
em
以外的其他无效标签,则只需遍历表达式并进行替换即可。
$re = '/(?=<em><em>)(<em>)(.*?)(<\/em>)/m';
$str = 'Adding insult to injury, <em><em>Jennifer <a href="somelink">Aniston</a></em> had literally <a href="somelink2">zero clue</a> what was coming.</em>
Adding insult to injury, <em><em>Jennifer Aniston</em> had literally <a href="somelink2">zero clue</a> what was coming.</em>
Adding insult to injury, <em><em>Jennifer Aniston</em> had literally <a href="somelink2">zero clue</a> what was coming.</em>
';
$subst = '$2';
$result = preg_replace($re, $subst, $str);
echo $result;
Please see the demo for additional explanation.
Adding insult to injury, <em>Jennifer <a href="somelink">Aniston</a> had literally <a href="somelink2">zero clue</a> what was coming.</em>
Adding insult to injury, <em>Jennifer Aniston had literally <a href="somelink2">zero clue</a> what was coming.</em>
Adding insult to injury, <em>Jennifer Aniston had literally <a href="somelink2">zero clue</a> what was coming.</em>