Question

目前，我可以毫不费力地从我的desired website抓取内容，但如果您查看my demo，则可以看到在我的阵列中它只显示来源无论我改变什么，它都没有修复..

$page = (isset($_GET['p'])&&$_GET['p']!=0) ? (int) $_GET['p'] : '';  
$html = file_get_html('http://screenrant.com/movie-news/'.$page);
foreach($html->find('#site-top ul h2 a') as $element)
{
        print '<br><br>';
        echo $url = ''.$element->href;
        $html2 = file_get_html($url);
        print '<br><br>';

        $image = $html2->find('meta[property=og:image]',0);
        print $news['image'] = $image->content;
        print '<br><br>';

        // Ending The Featured Image
        $title = $html2->find(' header > h1',0);
        print $news['title'] = $title->plaintext;

        print '<br>';
        // Ending the titles
        print '<br>';

        $articles = $html2->find('div.top-content > article > p');
        foreach ($articles as $article) {
            echo "$article->plaintext<p>";
        }
        $news['content'] =  $article->plaintext;

        print '<br><br>';
        #post> div:nth-child(2) > header > p > time
        $date = $html2->find('header > p > time',0);
        $news['date'] = $date->plaintext;

        $dexp = explode(', ',$date);

        print $date = $dexp[0].', '.$dexp[1];

        print '<br><br>';

        $genre = "news";
        print '<br>';

             mysqli_query($DB,"INSERT INTO `wp_scraped_news` SET
                                    `hash` = '".$news['title']."',
                                    `title` = '".$news['title']."',
                                    `image` = '".$news['image']."',
                                    `content` = '".$news['content']."'");
             print '<pre>';print_r($news);print '</pre>';
}

目前正在使用simple_html_dom.php来清除。

Answer 1

如果您看一下这段代码：

$articles = $html2->find('div.top-content > article > p');
foreach ($articles as $article) {
   echo "$article->plaintext<p>"; 
   //This is printing the article content line by line
}
$news['content'] =  $article->plaintext; 
//This is grabbing the last line of the article content AKA the source 
//The last <p> as it's not in the foreach.

实际上，您需要这样做：

$articles = $html2->find('div.top-content > article > p');
foreach ($articles as $article) {
    echo "$article->plaintext<p>"; 
    $news['content'] = $news['content'] . $article->plaintext . "<p>";
}

内容在打印时工作正常但不在数组中

1 个答案: