在到达字符串中的第一个p标签之前删除每个li标签

时间:2016-02-02 14:02:15

标签: php regex string domdocument

假设我有一个包含一些HTML的字符串。我想在到达第一个li代码之前删除每个p代码。

我如何实现这样的目标?

示例字符串:

$str = "<img src='something.png'/>some_text_here<li>needs_to_be_removed</li>
        <li>also_needs_to_be_removed</li>some_other_text<p>finally</p>more_text_here
        <li>this_should_not_be_removed</li>";`

需要移除前两个li代码。

4 个答案:

答案 0 :(得分:1)

您可以使用以下遍历功能使用PHP DOMdocument进行此操作

$doc = new DOMDocument();
$doc->loadHTML($str);
$foundp = false;
showDOMNode($doc);
//now $doc contains the string you want
$newstr = $doc->saveHTML();


function showDOMNode(DOMNode &$domNode) {
    global $foundp;
    foreach ($domNode->childNodes as $node)
    {
        if ($node->nodeName == "li" && $foundp==false){
            //delete this node
            $domNode->removeChild($node);
        }
        else if ($node->nodeName == "p"){
            //stop here
            $foundp = true;
            return;
        }
        else if($node->hasChildNodes() && $foundp==false) {
            //recursively
            showDOMNode($node);
        }
    }    
}

答案 1 :(得分:1)

这是你需要的。简单有效:

$mystring = "mystringwith<li>toberemovedstring</li><li>againremove</li><p>do not remove me</p>";//the string you provide
$findme   = '<li>';//the string you want to search in $mystring
$findpee = '<p>';//haha pee also where to end it
$pos = strpos($mystring, $findme);//first position of <li>
$pospee = strpos($mystring, $findpee);// then position of pee.. get it :)
//Then we remove it
$result=substr_replace ( $mystring ,"" , $pos, ($pospee-$pos));

    echo $result;

编辑:PHP沙箱

http://sandbox.onlinephpfunctions.com/code/e534259e2312682a04b64c6e3aae1521422aacd2

你也可以在这里查看结果

答案 2 :(得分:1)

使用XPath:

$str = "<img src='something.png'/>some_text_here<li>needs_to_be_removed</li>
        <li>also_needs_to_be_removed</li>some_other_text<p>finally</p>more_text_here
        <li>this_should_not_be_removed</li>";

libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTML('<div>' . $str .'</div>', LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
             // ^---------------^----- add a root element
$xp = new DOMXPath($dom);

$lis = $xp->query('//p[1]/preceding-sibling::li');

foreach ($lis as $li) {
    $li->parentNode->removeChild($li);
}

$result = '';
// add each child node of the root element to the result
foreach ($dom->getElementsByTagName('div')->item(0)->childNodes as $child) {
    $result .= $dom->saveHTML($child);
}

答案 3 :(得分:0)

我建议使用php praser库会更好更快的方法。我个人在我的项目中使用这个https://github.com/paquettg/php-html-parser。它提供了像

这样的api
boolean isSelected =false 

public void onTaskDone(View view){
    if(!isSelected){
        view.setTextColor(Color.RED);
        isSelected = true;
    }else{
        view.setTextColor(Color.GREEN);
        isSelected = false;
    }
}

以及更多可以派上用场的东西。

你可以为所有元素做一个foreach循环,在它们里面注册“li”标签,如果是第三次出现,你会找到一个“p”标签,你可以删除$ child-&gt; previousSibling();