目前我有这个烂摊子,没问怎么样:
$string = "
<p>
<b>Foo1:</b> Bar1<br>
<b>Foo2:</b> Bar2<br>
<b>Foo3:</b> Bar3<br>
<b>Foo4:</b> Bar4
</p>
<br>
<p></p>
<br>
<p>
</br>
<br />
<br/>
<br>
</p>
"
所以我需要修剪所有这些<br>
和<p>
这样:
$string = "
<p>
<b>Foo1:</b> Bar1<br>
<b>Foo2:</b> Bar2<br>
<b>Foo3:</b> Bar3<br>
<b>Foo4:</b> Bar4
</p>
"
我试着这样做:
$chars = " \t\n\r\0\x0B";
$subpattern = '(</?(br|p) ?/?[^>]*>)';
$pattern = '~(^'.$subpattern.'|'.$subpattern.'$)~i';
trim(preg_replace($pattern, '', $string), $chars)
但它只删除了最后<p>
,我怎么能让它正常工作?
答案 0 :(得分:0)
使用strip_tags函数。 Link to function description in PHP Doc.
答案 1 :(得分:0)
尝试解析HTML,然后丢弃空元素,而不是尝试正则表达式方法,因为这实际上是你想要实现的。像DOMDocument :: loadHTML(http://php.net/manual/en/domdocument.loadhtml.php)之类的东西会给你一个数组结构,你可以循环然后转换回HTML,一旦你删除了你不需要的部分。
答案 2 :(得分:0)
使用DOMDocument和DOMXPath的方法:
function isEmpty($n) {
$nodeList = $n[0]->childNodes;
foreach ($nodeList as $childNode) {
switch ( $childNode->nodeType ) {
case XML_ELEMENT_NODE:
if ( !in_array($childNode->nodeName, ["p", "br"]) ||
$childNode->nodeName == "p" && !isEmpty([$childNode]) ) return false;
case XML_TEXT_NODE:
if ( trim($childNode->nodeValue) !== "" ) return false;
}
}
return true;
}
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($string);
$xp = new DOMXPath($dom);
$xp->registerNamespace("php", "http://php.net/xpath");
$xp->registerPHPFunctions('isEmpty');
$nodeList = $xp->query('//br[not(./ancestor::p)] | //p[php:function("isEmpty", .)]');
foreach ($nodeList as $node) {
$node->parentNode->removeChild($node);
}
foreach ($dom->getElementsByTagName('body')->item(0)->childNodes as $childNode) {
echo $dom->saveHTML($childNode);
}
答案 3 :(得分:0)
不应使用HTML
解析regex
,而应使用DOMDocument
,我们只需DOMDocument
//p/b/..
<?php
ini_set('display_errors', 1);
libxml_use_internal_errors(true);
$string = <<<HTML
<p>
<b>Foo1:</b> Bar1<br>
<b>Foo2:</b> Bar2<br>
<b>Foo3:</b> Bar3<br>
<b>Foo4:</b> Bar4
</p>
</p>
<br>
<p></p>
<br>
<p>
</br>
<br/ >
<br/>
<br>
</p>
HTML;
$domObject= new DOMDocument();
$domObject->loadHTML($string, LIBXML_HTML_NODEFDTD);
$domXpath= new DOMXPath($domObject);
$results=$domXpath->query('//p/b/..');
foreach($results as $result)
{
echo $domObject->saveHTML($result);
}
<强>输出:强>
<p>
<b>Foo1:</b> Bar1<br>
<b>Foo2:</b> Bar2<br>
<b>Foo3:</b> Bar3<br>
<b>Foo4:</b> Bar4
</p>