我一直在尝试使用网络抓取工具从网站/网页中提取一些链接及其相关文字。但似乎我错过了一些东西,这导致了一个空白页面。希望你们能指出我的错误/
HTML页面如下::
<!DOCTYPE html>
<html>
<head>
<title>test</title>
</head>
<body>
<div class="NeededDiv">
<a href="link">text</a>
<a href="link">text</a>
<a href="link">text</a>
<a href="link">text</a>
<a href="link">text</a>
</div>
<div class="ExtraDiv">
<a href=""></a>
<a href=""></a>
<a href=""></a>
<a href=""></a>
<a href=""></a>
</div>
</body>
php代码为::
<?php
function get_data($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_URL,$url);
$result=curl_exec($ch);
curl_close($ch);
return $result;
}
$returned_content = get_data('file:///C:/xampp/htdocs/h/1.html');
$first_step = explode( '<div class="NeededDiv">' , $returned_content );
$second_step = explode('</div>', $first_step[0]);
$third_step = explode('</a>', $second_step[0]);
?>
所以,这里我试图使用php页面从页面中提取特定的div。我通过xampp本地主机打开了php页面..
感谢任何帮助..
答案 0 :(得分:1)
这是你想要的吗?
<?php
$returned_content='<!DOCTYPE html>
<html>
<head>
<title>test</title>
</head>
<body>
<div class="NeededDiv">
<a href="link">text</a>
<a href="link">text</a>
<a href="link">text</a>
<a href="link">text</a>
<a href="link">text</a>
</div>
<div class="ExtraDiv">
<a href=""></a>
<a href=""></a>
<a href=""></a>
<a href=""></a>
<a href=""></a>
</div>
</body>';
$dom = new DOMDocument;
$dom->loadHTML($returned_content);
foreach ($dom->getElementsByTagName('a') as $node) {
echo $node->getAttribute( 'href' )."<br/>";
}
?>