例如,在网页中给出了许多链接。
forward backward
将这两个作为两个链接。我想首先加载此页面,其中包含此链接并单击任何这些链接。注意[我不知道随机更改后点击它会加载的URL]
答案 0 :(得分:4)
这是一篇旧帖子,但对于任何寻找答案的人来说,我都有类似的问题,并且能够解决它。我使用PHP和cUrl。
通过cUrl链接的代码非常简单。
// Create a user agent so websites don't block you
$userAgent = 'Googlebot/2.1 (http://www.google.bot.com/bot.html)';
// Create the initial link you want.
$target_url = "http://www.example.com/somepage";
// Initialize curl and following options
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
// Grab the html from the page
$html = curl_exec($ch);
// Error handling
if(!$html){
handle error if page was not reachable, etc
exit();
}
// Create a new DOM Document to handle scraping
$dom = new DOMDocument();
@$dom->loadHTML($html);
// get your element, you can do this numerous ways like getting by tag, id or using a DOMXPath object
// This example gets elements with id forward-link which might be a div or ul or li, etc
// It then gets all the a tags (links) within all those divs, uls, etc
// Then it takes the first link in the array of links and then grabs the href from the link
$search = $dom->getElementById('forward-link');
$forwardlink = $search->getElementsByTagName('a');
$forwardlink = $forwardlink->item(0);
$forwardlink = $getNamedItem('href');
$href = $forwardlink->textContent;
// Now that you have the link you want to follow/click to
// Set the target_url for the cUrl to the new url
curl_setopt($ch, CURLOPT_URL, $target_url);
$html = curl_exec($ch);
// do what you want with your new link!
这是一个很好的教程:php curl tutorial
答案 1 :(得分:3)
您必须解析cUrl返回的HTML并找到链接,然后通过新的cUrl请求提取这些链接。