HTML DOM Parser - 如何在论坛中获取所有主题的第一篇文章

时间:2011-03-10 14:47:17

标签: php screen-scraping html-parsing simple-html-dom

我试图删除sitepoint javascript论坛中每个主题的第一篇文章。但DOM Parser会在现场点JAVASCRIPT FORUM给我所有主题的所有帖子。也许我没有正确遍历DOM?以下是我的代码:

<?php

class Sitepoint extends Controller
{
    public function index()
    {
        $this->load->helper('dom');
        header('Content-Type: text/html; charset=utf-8');
        echo '<ol>';

            $html = file_get_html('http://www.sitepoint.com/forums/javascript-15');

            foreach($html->find('a[id^="thread_title"]') as $topic) {
                $post =$topic->href;
                $posthtml = file_get_html($post);
                $posthtml->find('div[id^="post_message"]', 0);
                echo'<li>';
                echo $topic->plaintext.'<br>';
                echo $posthtml->plaintext.'<br>';
                echo'</li>';
            }
        echo '</ol>';
    }
}

1 个答案:

答案 0 :(得分:1)

您忘记将$posthtml->find的结果分配给变量:

foreach($html->find('a[id^="thread_title"]') as $topic) {
    $post =$topic->href;
    $posthtml = file_get_html($post);
    $posttext = $posthtml->find('div[id^="post_message_"]', 0);
    echo'<li>';
    echo $topic->plaintext.'<br>';
    echo $posttext->plaintext.'<br>';
    echo'</li>';
}