使用PHP从HTML中删除特定标记并转储到数组中

时间:2012-10-18 06:55:21

标签: php html arrays string dom

问题:

我使用Simple HTML DOM,我想从我设法从网页中提取的HTML代码中删除某些信息。其余代码应该进入一个数组。

PHP代码:

<?php
    include('simple_html_dom.php');

    // Create DOM from URL or file
    $html = file_get_html('http://www.dn.se');

    $results = $html->find('ul[class=list]', 1);

    echo $results;
?>

HTML code:

<ul class="list">
    <li>Mest delat och rekommenderat på DN.se</li>

    <li><a href="/nyheter/varlden/polis-misstog-blind-for-samuraj"><span class="number">1.</span> Polis misstog blind för samuraj<br>
    <span class="metadata">I går</span></a> <span class="sharetooltip  init-done" data-url="" style="display:inline-block;"><a class="sharecount" href="javascript:void(0);">868</a> <span class="tooltip"><span class="counter twit"><span class="twitcount">35</span> tweets</span> <span class="counter fb"><span class="fbcount">833</span> rekommendationer</span></span></span></li>

    <li><a href="/debatt/s-vill-ha-tioarig-skolplikt-och-farre-elever-i-klassen"><span class="number">2.</span> ”S vill ha tioårig skolplikt och färre elever i klassen”<br>
    <span class="metadata">I går</span></a> <span class="sharetooltip  init-done" data-url="" style="display:inline-block;"><a class="sharecount" href="javascript:void(0);">671</a> <span class="tooltip"><span class="counter twit"><span class="twitcount">77</span> tweets</span> <span class="counter fb"><span class="fbcount">594</span> rekommendationer</span></span></span></li>

    <li><a href="/sthlm/edholm-backar-om-skolornas-smorforbud"><span class="number">3.</span> Edholm backar om skolornas smörförbud<br>
    <span class="metadata">16 okt</span></a> <span class="sharetooltip  init-done" data-url="" style="display:inline-block;"><a class="sharecount" href="javascript:void(0);">604</a> <span class="tooltip"><span class="counter twit"><span class="twitcount">33</span> tweets</span> <span class="counter fb"><span class="fbcount">571</span> rekommendationer</span></span></span></li>
</ul>

所需的数组结构:

迭代父/子,以&lt; UL&GT;作为父母。

期望的输出:

Array 
{
    [0] => <li>Mest delat och rekommenderat på DN.se</li>
    [1] => <li><a href="/nyheter/varlden/polis-misstog-blind-for-samuraj">1. Polis misstog blind för samuraj</a></li>
        [0] => <a class="sharecount" href="javascript:void(0);">868</a>
        [1] => <span class="counter twit"><span class="twitcount">35</span> tweets</span>
        [2] => <span class="counter fb"><span class="fbcount">833</span> rekommendationer</span></span>
    [2] => <li><a href="/debatt/s-vill-ha-tioarig-skolplikt-och-farre-elever-i-klassen">2. ”S vill ha tioårig skolplikt och färre elever i klassen”</a></li>
        [0] => <a class="sharecount" href="javascript:void(0);">671</a>
        [1] => <span class="counter twit"><span class="twitcount">77</span> tweets</span>
        [2] => <span class="counter fb"><span class="fbcount">594</span> rekommendationer</span></span>
    [3] => <li><a href="/sthlm/edholm-backar-om-skolornas-smorforbud">3. Edholm backar om skolornas smörförbud</a></li>
        [0] => <a class="sharecount" href="javascript:void(0);">604</a>
        [1] => <span class="counter twit"><span class="twitcount">33</span> tweets</span>
        [2] => <span class="counter fb"><span class="fbcount">571</span> rekommendationer</span></span>
}

0 个答案:

没有答案