存储DOM元素以用作网站的新闻部分

时间:2017-11-08 08:13:39

标签: javascript php jquery dom

我已经能够使用file_get_contents浏览网站新闻部分,并从每篇文章中获取标题文字。那么我如何存储这些信息并在我网站的一个部分中使用它?

我的php:

<?php
$html = file_get_contents("https://www.coindesk.com/category/news/");

$dom = new DomDocument();
$internalErrors = libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_use_internal_errors($internalErrors);
$finder = new DomXPath($dom);
$classname="fade";
$nodes = $finder->query("//*[contains(@class, '$classname')]");
foreach ($nodes as $node) {
    echo $node->nodeValue."<br>"; 
} 
?>

我想要存储它:

<div id="box5" class="toggle" style="display: none;">
        <div id="services" class="services">
                <div class="container" >
                    <div class="service-head text-center">
                        <h2>NEWS</h2>
                        <span> </span>

                    </div>
                <button class="accordion">STORE THE POST TITLE HERE</button>
                <div class="panel1">
                  <p>STORE THE POST SUMMARY HERE WITH LINKS TO ARTICLE</p>
                </div>

                <button class="accordion">Section 2</button>
                <div class="panel1">
                  <p></p>
                </div>

                <button class="accordion">Section 3</button>
                <div class="panel1">
                  <p></p>
                </div>
          </div>
        </div>
      </div>

1 个答案:

答案 0 :(得分:1)

相当简单 - 一旦XPath表达式匹配了内容,您就将节点内容存储到一个数组或对象中,该数组或对象可以在以后的同一页面中使用,保存到db或添加到会话以便在另一个上使用页。

/* source url */
$url='https://www.coindesk.com/category/news/';

/* store results in this array */
$output=array();

/* XPath expressions */
$exp=new stdClass;
$exp->articles='//div[@id="content"]/div[ contains(@class,"article") ]/div[@class="post-info"]';
$exp->title='h3/a';
$exp->description='p[@class="desc"]';

/* Load the source url directly into DOMDocument */
$dom=new DOMDocument;
$dom->validateOnParse=false;
$dom->standalone=true;
$dom->preserveWhiteSpace=true;
$dom->strictErrorChecking=false;
$dom->substituteEntities=false;
$dom->recover=true;
$dom->formatOutput=true;
$dom->loadHTMLFile( $url );
libxml_clear_errors();

/* Query the DOM and process nodes found */
$xp=new DOMXPath( $dom );
$col=$xp->query( $exp->articles );

if( !empty( $col ) && $col->length > 0 ){
    foreach( $col as $node ){
        $output[]=(object)array(
            'title'         =>  $xp->query($exp->title,$node)->item(0)->nodeValue,
            'description'   =>  $xp->query($exp->description,$node)->item(0)->nodeValue
        );
    }
}
$dom = $xp = $col = $node = null;


/* 
    The contents of the scrape are stored in the $output array
    and can be used whereever on the page you wish - or stored
    as a session variable and used elsewhere etc etc
*/
if( !empty( $output ) ){
    /*
        removed `display:none` from div below.....
    */
    echo "
    <div id='box5' class='toggle'>
        <div id='services' class='services'>
            <div class='container' >
                <div class='service-head text-center'>
                    <h2>NEWS</h2>
                    <span> </span>
                </div>";

    /* iterate through output array where each member is an object */
    foreach( $output as $i => $obj ){
        echo "
                <button class='accordion'>{$obj->title}</button>
                <div class='panel1'>
                    <p>{$obj->description}</p>
                </div>";
    }

    echo "
            </div>
        </div>
    </div>";
}