如何在两个标签之间获取元素Simple Html Dom

时间:2017-02-07 08:29:39

标签: php dom web-scraping simple-html-dom

这是我的Html

<b><font color="Red">Flash Player 720p HD Quality Online Links</font></b>
        <br>
        <br>
        <a href="http://bestarticles.me/jaana-na-dil-se-door/?si=5325359" target="_blank">Jaana Na Dil Se Door 6th February 2017 Watch Online Video- Part 1</a>
        <br>
        <a href="http://bestarticles.me/jaana-na-dil-se-door/?si=5325360" target="_blank">Jaana Na Dil Se Door 6th February 2017 Watch Online Video- Part 2</a>
        <br>
        <br>
        <b><font color="Red">Dailymotion 720p HD Quality Online Links</font></b>
        <br>
        <br>
        <a href="http://bestarticles.me/jaana-na-dil-se-door/?si=k4r2rHPOgem8yAlGqjj" target="_blank">Jaana Na Dil Se Door 6th February 2017 Watch Online Video- Part 1</a>
        <br>
        <a href="http://bestarticles.me/jaana-na-dil-se-door/?si=k63MLC2Vq6fxsPlGqjp" target="_blank">Jaana Na Dil Se Door 6th February 2017 Watch Online Video- Part 2</a>
        <br>
        <br>
        <b><font color="Red">TVLogy 720p HD Quality Online Links</font></b>
        <br>
        <br>
        <a href="http://reviewtv.in/star-plus/?si=YD29025" target="_blank">Jaana Na Dil Se Door 6th February 2017 Watch Online Video- Part 1</a>
        <br>
        <a href="http://reviewtv.in/star-plus/?si=YD29026" target="_blank">Jaana Na Dil Se Door 6th February 2017 Watch Online Video- Part 2</a>
        <br>
        <br>
        <b><font color="Red">Letwatch 720p HD Quality Online Links</font></b>
        <br>
        <br>
        <a href="http://www.tellycolors.me/star-plus/?si=j3vpekz3jeiv" target="_blank">Jaana Na Dil Se Door 6th February 2017 Watch Online Video - Part 1</a>
        <br>
        <a href="http://www.tellycolors.me/star-plus/?si=bdjg53bz9gdi" target="_blank">Jaana Na Dil Se Door 6th February 2017 Watch Online Video - Part 2</a>
        <br>
        <br>
        <b><font color="Red">Vidwatch 720p HD Quality Online Links</font></b>
        <br>
        <br>
        <a href="http://hd-rulez.info/vidwatch.php?id=73sbn356g9nc" target="_blank">Jaana Na Dil Se Door 6th February 2017 Watch Online Video - Part 1</a>
        <br>
        <a href="http://hd-rulez.info/vidwatch.php?id=73x796cifyvq" target="_blank">Jaana Na Dil Se Door 6th February 2017 Watch Online Video - Part 2</a>
        <br>
        <br>

我正在使用Simple Html Dom php库进行报废。我想废弃<b>标记及其锚标记。每个<b>元素都有<a>个锚点集。所以我想像这样废弃

array(
       'Flash Player' => array( 'link1', 'link2' ),
       'Daiylymotion' => array('link1', 'link2', 'link3'),
       etc...
);

这就是我在做的事情。首先,我转义了所有<br>代码,然后循环播放所有<b>代码,然后我尝试通过$ b-&gt; next_sibling()尝试获取<b>代码的下一个兄弟,但是它不起作用,因为转义<br>标记未更新元素的索引。这是我的代码

$html = str_get_html($html);
$content = $html->find('div.postcontent',0);

   //escape all br
    foreach($content->find('br') as $br){
        $br->outertext = '';
    }

    foreach($content->find('b') as $key => $b){


        echo $b->plaintext;

    }

请使用其他策略将<b>标记与<a>一起删除,以帮助我。谢谢

1 个答案:

答案 0 :(得分:0)

我不知道是否还有其他更简单的方法。但只要每个<b>标记后面都包含两个 <a>标记,此代码就会提供您想要的输出。

    $aCount = 0;
    $result = array();
    foreach($content->find('b') as $key => $b){
        $index = $b->plaintext;  
        for($i=0;$i<2;$i++){
            $result[$index][] = $content->find('a',$aCount++)->href;
        }       
    }  
    print_r($result);

输出类似

Array
(
    [Flash Player 720p HD Quality Online Links] => Array
        (
            [0] => http://bestarticles.me/jaana-na-dil-se-door/?si=5325359
            [1] => http://bestarticles.me/jaana-na-dil-se-door/?si=5325360
        )

    [Dailymotion 720p HD Quality Online Links] => Array
        (
            [0] => http://bestarticles.me/jaana-na-dil-se-door/?si=k4r2rHPOgem8yAlGqjj
            [1] => http://bestarticles.me/jaana-na-dil-se-door/?si=k63MLC2Vq6fxsPlGqjp
        )

    [TVLogy 720p HD Quality Online Links] => Array
        (
            [0] => http://reviewtv.in/star-plus/?si=YD29025
            [1] => http://reviewtv.in/star-plus/?si=YD29026
        )

    [Letwatch 720p HD Quality Online Links] => Array
        (
            [0] => http://www.tellycolors.me/star-plus/?si=j3vpekz3jeiv
            [1] => http://www.tellycolors.me/star-plus/?si=bdjg53bz9gdi
        )

    [Vidwatch 720p HD Quality Online Links] => Array
        (
            [0] => http://hd-rulez.info/vidwatch.php?id=73sbn356g9nc
            [1] => http://hd-rulez.info/vidwatch.php?id=73x796cifyvq
        )

)