输入是维基百科页面的第一段。我想删除括号和括号之间的任何内容。
但是,有时(通常),括号内的HTML内容本身包含一个或多个括号,通常位于链接的href=""
中。
采取以下措施:
<p>
The <b>Sarcopterygii</b> or <b>lobe-finned fish</b> (from Greek σαρξ <i>sarx</i>, flesh, and πτερυξ <i>pteryx</i>, fin) – sometimes considered synonymous with <b>Crossopterygii</b> ("fringe-finned fish", from Greek κροσσός <i>krossos</i>, fringe) – constitute a <a href="/wiki/Clade" title="Clade">clade</a> (traditionally a <a href="/wiki/Class_(biology)" title="Class (biology)">class</a> or subclass) of the <a href="/wiki/Osteichthyes" title="Osteichthyes">bony fish</a>, though a strict <a href="/wiki/Cladistic" class="mw-redirect" title="Cladistic">cladistic</a> view includes the terrestrial <a href="/wiki/Vertebrate" title="Vertebrate">vertebrates</a>.
</p>
我希望最终结果是:
<p>
The <b>Sarcopterygii</b> or <b>lobe-finned fish</b> – sometimes considered synonymous with <b>Crossopterygii</b> – constitute a <a href="/wiki/Clade" title="Clade">clade</a> of the <a href="/wiki/Osteichthyes" title="Osteichthyes">bony fish</a>, though a strict <a href="/wiki/Cladistic" class="mw-redirect" title="Cladistic">cladistic</a> view includes the terrestrial <a href="/wiki/Vertebrate" title="Vertebrate">vertebrates</a>.
</p>
但是当我使用下面的preg_replace
模式时,它不起作用,它会被括号内的括号弄糊涂。
public function removeParentheses( $content ) {
$pattern = '@\(.*?\)@';
$content = preg_replace( $pattern, '', $content );
$content = str_replace( ' .', '.', $content );
$content = str_replace( ' ', ' ', $content );
return $content;
}
其次,如何将括号放在链接'href=""
和title=""
中?如果不在文本括号内,这些很重要。
答案 0 :(得分:2)
您可以使用占位符替换所有链接,然后删除所有括号,最后将占位符替换回其原始值。
这是通过preg_replace_callback()
完成的,传递一个出现计数器和一个替换数组来跟踪链接,然后使用你自己的removeParentheses()
去除括号,最后使用{{3使用str_replace()
和array_keys()
来获取您的链接:
<?php
$string = '<p>
The <b>Sarcopterygii</b> or <b>lobe-finned fish</b> (from Greek σαρξ <i>sarx</i>, flesh, and πτερυξ <i>pteryx</i>, fin) – sometimes considered synonymous with <b>Crossopterygii</b> ("fringe-finned fish", from Greek κροσσός <i>krossos</i>, fringe) – constitute a <a href="/wiki/Clade" title="Clade">clade</a> (traditionally a <a href="/wiki/Class_(biology)" title="Class (biology)">class</a> or subclass) of the <a href="/wiki/Osteichthyes" title="Osteichthyes">bony fish</a>, though a strict <a href="/wiki/Cladistic" class="mw-redirect" title="Cladistic">cladistic</a> view includes the terrestrial <a href="/wiki/Vertebrate" title="Vertebrate">vertebrates</a>.
</p>';
$occurrences = 0;
$replacements = [];
$replacedString = preg_replace_callback("/<a .*?>.*?<\/a>/i", function($el) use (&$occurrences, &$replacements) {
$replacements["|||".$occurrences] = $el[0]; // the ||| are just to avoid unwanted matches
return "|||".$occurrences++;
}, $string);
function removeParentheses( $content ) {
$pattern = '@\(.*?\)@';
$content = preg_replace( $pattern, '', $content );
$content = str_replace( ' .', '.', $content );
$content = str_replace( ' ', ' ', $content );
return $content;
}
$replacedString = removeParentheses($replacedString);
$replacedString = str_replace(array_keys($replacements), array_values($replacements), $replacedString); // get your links back
echo $replacedString;
<强>结果强>
<p>
The <b>Sarcopterygii</b> or <b>lobe-finned fish</b> – sometimes considered synonymous with <b>Crossopterygii</b> – constitute a <a href="/wiki/Clade" title="Clade">clade</a> of the <a href="/wiki/Osteichthyes" title="Osteichthyes">bony fish</a>, though a strict <a href="/wiki/Cladistic" class="mw-redirect" title="Cladistic">cladistic</a> view includes the terrestrial <a href="/wiki/Vertebrate" title="Vertebrate">vertebrates</a>.
</p>
然而,在我看来,这有点脆弱。正如其他人在评论中告诉你的那样,你Demo。 lot 可以更改,您可以获得意外结果。这可能会让你朝着正确的方向前进。
关于括号内括号的编辑,您可以使用递归模式。看看shouldn't parse HTML with regular expressions:
function removeParentheses( $content ) {
$pattern = '@\(([^()]|(?R))*\)@';
$content = preg_replace( $pattern, '', $content );
$content = str_replace( ' .', '.', $content );
$content = str_replace( ' ', ' ', $content );
return $content;
}