我正在尝试利用DOMDocument从另一个网站上刮取一个表。我在共享主机上。
这是html的样子:
<tbody>
<tr class="odd">
<td class="nightclub">Elleven</td>
<td class="city">Downtown Miami</td>
</tr>
<tr class="even">
<td class="night club">Story</td>
<td class="city">South Beach</td>
</tr>
</tbody>
我尝试过:
<?php
$domDoc = new \DOMDocument();
$url = "http://example.com/";
$html = file_get_contents($url);
$domDoc->loadHtml($html);
$domDoc->preserveWhiteSpace = false;
$tables = $domDoc->getElementsByTagName('tbody');
$rows = $tables->item(0)->getElementsByTagName('tr');
foreach ($rows as $row)
{
$columns = $row->getElementsByTagName('td');
print $columns->item(0)->nodeValue."/n";
print $columns->item(1)->nodeValue."/n";
print $columns->item(2)->nodeValue;
}
当我这样做时,我得不到结果。我认为服务器阻止了我的请求。
答案 0 :(得分:1)
尝试使用simplehtmldom
Here
// Create DOM from URL or file
$html = file_get_html('http://www.example.com/');
// Find all tr
foreach($html->find('tr') as $element)
echo $element->innertext . '<br>';
解析HTML Manual
的好库答案 1 :(得分:0)
如果您不介意,这是最简单的解决方案。使用Simple Html Dom,如下所示:
$args['include'] = array(152,5426,3057,5763,1720,3103...);
详见答案here。
答案 2 :(得分:0)
我所做的是使用一个名为Guzzle的开源PHP包。它甚至可以让你爬进你正在使用的网站。
如果您在共享主机上,请下载Guzzle并将其上传到您的服务器。
github.com/guzzle/guzzle/releases
<?php
require 'vendor/autoload.php';
$client = new GuzzleHttp\Client();
$domDoc = new DOMDocument();
$url = 'http://example.com';
$res = $client->request('GET', $url, [
'auth' => ['user', 'pass']
]);
$html = (string)$res->getBody();
// The @ in front of $domDoc will suppress any warnings
$domHtml = @$dom->loadHTML($html);
//discard white space
$domDoc->preserveWhiteSpace = false;
//the table by its tag name
$tables = $domDoc->getElementsByTagName('tbody');
//get all rows from the table
$rows = $tables->item(0)->getElementsByTagName('tr');
// loop over the table rows
foreach ($rows as $row)
{
// get each column by tag name
$columns = $row->getElementsByTagName('td');
// echo the values
echo $columns->item(0)->nodeValue.'<br />';
echo $columns->item(1)->nodeValue.'<br />';
echo $columns->item(2)->nodeValue;
}
?>
答案 3 :(得分:-1)
你的代码是完美的只删除\ $ domDoc = new \ DOMDocument();
尝试
$ domDoc = new DOMDocument();