从外部html表PHP创建数组

时间:2017-09-22 00:50:53

标签: php arrays curl

我有一些代码,我试图从外部页面中提取2个独立表的值,并为每个行/列创建一个数组。这是2个表格的html。

表1

<table class="report" cellspacing="0" >
  <thead>
    <tr>
      <th>Team</th>
      <th>Win %</th>
      <th>Games</th>
      <th>Wins</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="font-weight:bold;"> Division: A </td>
    </tr>
    <tr>
      <td> Team 1 </td>
      <td class='rightaligned'> 98.0 </td>
      <td class='rightaligned'> 51 </td>
      <td class='rightaligned'> 50 </td>
    </tr>
    <tr>
      <td> Team 6 </td>
      <td class='rightaligned'> 76.5 </td>
      <td class='rightaligned'> 51 </td>
      <td class='rightaligned'> 39 </td>
    </tr>
    <tr>
      <td> Team 8 </td>
      <td class='rightaligned'> 56.9 </td>
      <td class='rightaligned'> 51 </td>
      <td class='rightaligned'> 29 </td>
    </tr>
    <tr>
      <td> Team 4 </td>
      <td class='rightaligned'> 73.5 </td>
      <td class='rightaligned'> 34 </td>
      <td class='rightaligned'> 25 </td>
    </tr>
    <tr>
      <td> Team 9 </td>
      <td class='rightaligned'> 43.1 </td>
      <td class='rightaligned'> 51 </td>
      <td class='rightaligned'> 22 </td>
    </tr>
    <tr>
      <td> Team 5 </td>
      <td class='rightaligned'> 47.1 </td>
      <td class='rightaligned'> 34 </td>
      <td class='rightaligned'> 16 </td>
    </tr>
    <tr>
      <td> Team 10 </td>
      <td class='rightaligned'> 29.4 </td>
      <td class='rightaligned'> 51 </td>
      <td class='rightaligned'> 15 </td>
    </tr>
    <tr>
      <td> Team 7 </td>
      <td class='rightaligned'> 25.5 </td>
      <td class='rightaligned'> 51 </td>
      <td class='rightaligned'> 13 </td>
    </tr>
    <tr>
      <td> Team 2 </td>
      <td class='rightaligned'> 20.6 </td>
      <td class='rightaligned'> 34 </td>
      <td class='rightaligned'> 7 </td>
    </tr>
    <tr>
      <td> Team 3 </td>
      <td class='rightaligned'> 14.7 </td>
      <td class='rightaligned'> 34 </td>
      <td class='rightaligned'> 5 </td>
    </tr>
  </tbody>
</table>

表2

<table class="report" cellspacing="0" >
  <thead>
    <tr>
      <th>Team</th>
      <th>Against</th>
      <th>Date</th>
      <th>Week</th>
      <th>Games</th>
      <th>Wins</th>
      <th>Losses</th>
      <th>Forfeits</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td> Team 1 </td>
      <td> Team 7 </td>
      <td class='rightaligned'> 09/19/2017 </td>
      <td class='rightaligned'> 2 </td>
      <td class='rightaligned'> 17 </td>
      <td class='rightaligned'> 17 </td>
      <td class='rightaligned'> 0 </td>
      <td class='rightaligned'> 0 </td>
    </tr>
    <tr>
      <td> Team 8 </td>
      <td> Team 9 </td>
      <td class='rightaligned'> 09/19/2017 </td>
      <td class='rightaligned'> 2 </td>
      <td class='rightaligned'> 17 </td>
      <td class='rightaligned'> 14 </td>
      <td class='rightaligned'> 3 </td>
      <td class='rightaligned'> 0 </td>
    </tr>
    <tr>
      <td> Team 6 </td>
      <td> Team 10 </td>
      <td class='rightaligned'> 09/19/2017 </td>
      <td class='rightaligned'> 2 </td>
      <td class='rightaligned'> 17 </td>
      <td class='rightaligned'> 14 </td>
      <td class='rightaligned'> 3 </td>
      <td class='rightaligned'> 0 </td>
    </tr>
    <tr>
      <td> Team 5 </td>
      <td> Team 4 </td>
      <td class='rightaligned'> 09/12/2017 </td>
      <td class='rightaligned'> 1 </td>
      <td class='rightaligned'> 17 </td>
      <td class='rightaligned'> 9 </td>
      <td class='rightaligned'> 8 </td>
      <td class='rightaligned'> 0 </td>
    </tr>
    <tr>
      <td> Team 4 </td>
      <td> Team 5 </td>
      <td class='rightaligned'> 09/12/2017 </td>
      <td class='rightaligned'> 1 </td>
      <td class='rightaligned'> 17 </td>
      <td class='rightaligned'> 8 </td>
      <td class='rightaligned'> 9 </td>
      <td class='rightaligned'> 0 </td>
    </tr>
    <tr>
      <td> Team 2 </td>
      <td> Team 7 </td>
      <td class='rightaligned'> 09/12/2017 </td>
      <td class='rightaligned'> 1 </td>
      <td class='rightaligned'> 17 </td>
      <td class='rightaligned'> 4 </td>
      <td class='rightaligned'> 13 </td>
      <td class='rightaligned'> 0 </td>
    </tr>
    <tr>
      <td> Team 9 </td>
      <td> Team 8 </td>
      <td class='rightaligned'> 09/19/2017 </td>
      <td class='rightaligned'> 2 </td>
      <td class='rightaligned'> 17 </td>
      <td class='rightaligned'> 3 </td>
      <td class='rightaligned'> 14 </td>
      <td class='rightaligned'> 0 </td>
    </tr>
    <tr>
      <td> Team 10 </td>
      <td> Team 6 </td>
      <td class='rightaligned'> 09/19/2017 </td>
      <td class='rightaligned'> 2 </td>
      <td class='rightaligned'> 17 </td>
      <td class='rightaligned'> 3 </td>
      <td class='rightaligned'> 14 </td>
      <td class='rightaligned'> 0 </td>
    </tr>
    <tr>
      <td> Team 3 </td>
      <td> Team 6 </td>
      <td class='rightaligned'> 09/12/2017 </td>
      <td class='rightaligned'> 1 </td>
      <td class='rightaligned'> 17 </td>
      <td class='rightaligned'> 2 </td>
      <td class='rightaligned'> 15 </td>
      <td class='rightaligned'> 0 </td>
    </tr>
    <tr>
      <td> Team 7 </td>
      <td> Team 1 </td>
      <td class='rightaligned'> 09/19/2017 </td>
      <td class='rightaligned'> 2 </td>
      <td class='rightaligned'> 17 </td>
      <td class='rightaligned'> 0 </td>
      <td class='rightaligned'> 17 </td>
      <td class='rightaligned'> 0 </td>
    </tr>
  </tbody>
</table>

使用下面的代码,我可以将第一个表的值拉到数组中。

<?php
$url = '***';

$options = array(
    CURLOPT_RETURNTRANSFER => true,     // return web page
    CURLOPT_HEADER         => false,    // don't return headers
    CURLOPT_FOLLOWLOCATION => true,     // follow redirects
    CURLOPT_ENCODING       => "",       // handle all encodings
    CURLOPT_USERAGENT      => "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:18.0) Gecko/20100101 Firefox/18.0", // something like Firefox 
    CURLOPT_AUTOREFERER    => true,     // set referer on redirect
    CURLOPT_CONNECTTIMEOUT => 120,      // timeout on connect
    CURLOPT_TIMEOUT        => 120,      // timeout on response
    CURLOPT_MAXREDIRS      => 10,       // stop after 10 redirects
);

$curl = curl_init($url);
curl_setopt_array( $curl, $options );
$content = curl_exec($curl);
curl_close($curl);
$dom = new DOMDocument();
@$dom->loadHTML($content);
$xpath = new DOMXPath($dom); 

$tables = $dom->getElementsByTagName('tbody'); 
$rows = $tables->item(0)->getElementsByTagName('tr');

foreach ($rows as $row) 
{ 

$cols = $row->getElementsByTagName('td');

$date = $cols->item(0)->nodeValue; $element[$i]['team'] = trim($date);
$percentage = $cols->item(1)->nodeValue; $element[$i]['percentage'] = trim($percentage);
$wins = $cols->item(2)->nodeValue; $element[$i]['wins'] = trim($wins);
$games = $cols->item(3)->nodeValue; $element[$i]['games'] = trim($games);


$i++;
} 

echo '<pre>';            
print_r ($element);
echo '<pre>';            
?>

这里的输出最终会看起来像

Array
(
    [] => Array
        (
            [team] => Division: A
            [percentage] => 
            [wins] => 
            [games] => 
        )

    [1] => Array
        (
            [team] => Team 1
            [percentage] => 98.0
            [wins] => 51
            [games] => 50
        )

    [2] => Array
        (
            [team] => Team 6
            [percentage] => 76.5
            [wins] => 51
            [games] => 39
        )

    [3] => Array
        (
            [team] => Team 8
            [percentage] => 56.9
            [wins] => 51
            [games] => 29
        )

    [4] => Array
        (
            [team] => Team 4
            [percentage] => 73.5
            [wins] => 34
            [games] => 25
        )

    [5] => Array
        (
            [team] => Team 9
            [percentage] => 43.1
            [wins] => 51
            [games] => 22
        )

    [6] => Array
        (
            [team] => Team 5
            [percentage] => 47.1
            [wins] => 34
            [games] => 16
        )

    [7] => Array
        (
            [team] => Team 10
            [percentage] => 29.4
            [wins] => 51
            [games] => 15
        )

    [8] => Array
        (
            [team] => Team 7
            [percentage] => 25.5
            [wins] => 51
            [games] => 13
        )

    [9] => Array
        (
            [team] => Team 2
            [percentage] => 20.6
            [wins] => 34
            [games] => 7
        )

    [10] => Array
        (
            [team] => Team 3
            [percentage] => 14.7
            [wins] => 34
            [games] => 5
        )

)

现在输出一切正常,但它完全没有第二个表。如何才能获取第二个表信息?

感谢任何输入

1 个答案:

答案 0 :(得分:0)

打印表子节点

print_r($tables->childNodes);

现在你知道数组的结构是什么样的,所以在表格中循环,foreach表在行和列中循环。