使用PHP进行数据压缩

时间:2016-12-07 08:47:13

标签: php html laravel

我正在尝试利用DOMDocument从另一个网站上刮取一个表。我在共享主机上。

这是html的样子:

<tbody>

<tr class="odd">
<td class="nightclub">Elleven</td>
<td class="city">Downtown Miami</td>
</tr>

<tr class="even">
<td class="night club">Story</td>
<td class="city">South Beach</td>
</tr>

</tbody>

我尝试过:

<?php
$domDoc = new \DOMDocument();
$url = "http://example.com/";
$html = file_get_contents($url);
$domDoc->loadHtml($html);

$domDoc->preserveWhiteSpace = false;


$tables = $domDoc->getElementsByTagName('tbody');



$rows = $tables->item(0)->getElementsByTagName('tr');


 foreach ($rows as $row)
 {

  $columns = $row->getElementsByTagName('td');

  print $columns->item(0)->nodeValue."/n";
  print $columns->item(1)->nodeValue."/n";
  print $columns->item(2)->nodeValue;
}

当我这样做时,我得不到结果。我认为服务器阻止了我的请求。

4 个答案:

答案 0 :(得分:1)

尝试使用simplehtmldom Here

// Create DOM from URL or file
$html = file_get_html('http://www.example.com/');

// Find all tr 
foreach($html->find('tr') as $element) 
       echo $element->innertext . '<br>';

解析HTML Manual

的好库

答案 1 :(得分:0)

如果您不介意,这是最简单的解决方案。使用Simple Html Dom,如下所示:

$args['include'] = array(152,5426,3057,5763,1720,3103...);

详见答案here

答案 2 :(得分:0)

我所做的是使用一个名为Guzzle的开源PHP包。它甚至可以让你爬进你正在使用的网站。

如果您在共享主机上,请下载Guzzle并将其上传到您的服务器。

github.com/guzzle/guzzle/releases

<?php
require 'vendor/autoload.php';

$client = new GuzzleHttp\Client();
$domDoc = new DOMDocument();
$url = 'http://example.com';

$res = $client->request('GET', $url, [
    'auth' => ['user', 'pass']
]);


$html = (string)$res->getBody();


// The @ in front of $domDoc will suppress any warnings
$domHtml = @$dom->loadHTML($html);

  //discard white space 
  $domDoc->preserveWhiteSpace = false;

  //the table by its tag name
  $tables = $domDoc->getElementsByTagName('tbody');


  //get all rows from the table
  $rows = $tables->item(0)->getElementsByTagName('tr');

  // loop over the table rows
  foreach ($rows as $row)
  {
   // get each column by tag name
      $columns = $row->getElementsByTagName('td');
   // echo the values  
      echo $columns->item(0)->nodeValue.'<br />';
      echo $columns->item(1)->nodeValue.'<br />';
      echo $columns->item(2)->nodeValue;
    }


?>

答案 3 :(得分:-1)

你的代码是完美的只删除\ $ domDoc = new \ DOMDocument();

尝试

$ domDoc = new DOMDocument();