刮多div类php并创建一个表

时间:2018-01-16 22:32:55

标签: php html web-scraping scrape

我如何用一些具有相同名称的div类刮取页面,如何创建一个html表? 这是页面代码:

<div class="date">20/11/2018</div>
<div class="time">12:00</div>
<div class="nation">Italy</div>

<div class="date">20/11/2020</div>
<div class="time">12:00</div>
<div class="nation">England</div>

<div class="date">20/11/2025</div>
<div class="time">13:00</div>
<div class="nation">Spain</div>

我想创建一个带有抓取数据的html表,例如:

DATE | TIME | NATION
X    | X    | X
每个div类名称

。我只能抓一个div,这是我的代码,我想在html页面中为每个div类循环。在没有表格代码的情况下查看我的代码:

include("simple_html_dom.php");
$html = file_get_contents('https://test.test');
$dom = new DOMDocument();
$dom->loadHTML($html);
$finder = new DomXPath($dom);

$classname = "date";
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");
$data1 = $nodes{0}->nodeValue;

echo $data1;

1 个答案:

答案 0 :(得分:0)

假设您可以将parentNode定位到所需的元素,那么您可能会达到您想要的目标。

使用parentNode

<?php
    $html="
    <div id='some_container_id'>
        <div class='date'>20/11/2018</div>
        <div class='time'>12:00</div>
        <div class='nation'>Italy</div>

        <div class='date'>20/11/2020</div>
        <div class='time'>12:00</div>
        <div class='nation'>England</div>

        <div class='date'>20/11/2025</div>
        <div class='time'>13:00</div>
        <div class='nation'>Spain</div>
    </div>";

    $dom = new DOMDocument;
    $dom->loadHTML( $html );
    $xp = new DOMXPath( $dom );

    $query = '//div[@id="some_container_id"]';
    $col=$xp->query( $query );

    if( $col && $col->length > 0 ){

        $arr=array();


        foreach( $col as $node ){
            $query=sprintf('div[@class="%s"]|div[@class="%s"]|div[@class="%s"]','date','time','nation');
            $nodes=$xp->query( $query, $node );
            if( $nodes->length > 0 ){
                foreach( $nodes as $item )$arr[]=$item->nodeValue;
            }
        }

        if( !empty( $arr ) ){
            $chunks=array_chunk( $arr, 3 );
        }

        echo '
        <table>
            <tr>
                <th>Date</th>
                <th>Time</th>
                <th>Nation</th>
            </tr>';

        foreach( $chunks as $chunk ){
            echo "
            <tr>
                <td>{$chunk[0]}</td>
                <td>{$chunk[1]}</td>
                <td>{$chunk[2]}</td>
            </tr>";
        }
        echo '
        </table>';
    }
?>

没有parentNode

$dom = new DOMDocument;
$dom->loadHTML( $html );
$xp = new DOMXPath( $dom );


$query = sprintf('//div[ contains( @class,"%s" ) or contains( @class, "%s" ) or contains( @class, "%s" )  ]', 'date', 'time', 'nation' );
$col=$xp->query( $query );

if( $col && $col->length > 0 ){

    $arr=array();
    foreach( $col as $node ){
        $arr[]=$node->nodeValue;
    }

    if( !empty( $arr ) ){
        $chunks=array_chunk( $arr, 3 );
    }

    echo '
    <table>
        <tr>
            <th>Date</th>
            <th>Time</th>
            <th>Nation</th>
        </tr>';

    foreach( $chunks as $chunk ){
        echo "
        <tr>
            <td>{$chunk[0]}</td>
            <td>{$chunk[1]}</td>
            <td>{$chunk[2]}</td>
        </tr>";
    }
    echo '
    </table>';
}