我正在构建损坏的链接检查器。到目前为止,我已经设法使它爬网到用户指定的链接页面,然后回显完整的链接列表,所以我知道它可以找到它们。我现在需要做的是检查每个链接的HTTP响应(然后可以稍后以某种方式突出显示)。
$html = file_get_contents($_POST['urlInput']);
$dom = new DOMDocument();
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");
for ($i = 0; $i < $hrefs->length; $i++) {
$href = $hrefs->item($i);
$url = $href->getAttribute('href');
echo $url.'<br>';
这是我的问题开始的地方。上面的代码将找到并回显页面上的所有链接,而我试图弄清楚如何检查每个链接。 下面的代码不起作用,但是对PHP来说是全新的,我不知道它是什么和/或没有做!
get_http_response_code($url);
}
function get_http_response_code($url) {
$headers = get_headers($url);
return substr($headers[0], 9, 3);
}
$get_http_response_code = get_http_response_code($url);
if ( $get_http_response_code == 200 ) {
echo "Working!";
} else {
echo "Broken!";
}
现在,我只想了解如何使它回显每个链接的正常工作或损坏。
谢谢!
答案 0 :(得分:0)
找到解决方案:
<?php
$website = $_POST['urlInput'];
$html = file_get_contents($website);
$website = preg_replace('{/$}', '', $website);
$dom = new DOMDocument();
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");
for ($i = 0; $i < $hrefs->length; $i++) {
$href = $hrefs->item($i);
$url = $href->getAttribute('href');
//The following two lines included to handle relative paths
$url = ltrim($url, '/');
$url = isAbsoluteUrl($url) ? $url : $website.'/'.$url;
//It then calls this function for each iteration of loop
get_http_response_code($url);
}
function get_http_response_code($url) {
$handle = curl_init($url);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, TRUE);
$response = curl_exec($handle);
$httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
if($httpCode == 200) {
echo $url . ": <span style='color: green'>ok</span> <br>";
} else if($httpCode == 301 || $httpCode == 302) {
echo $url . ": <span style='color: orange'>Redirected</span> <br>";
} else {
echo $url . ": <span style='color: red'>Nook</span> <br>";
}
curl_close($handle);
}
function isAbsoluteUrl($url)
{
if (strpos($url, 'http://') !== false || strpos($url, 'https://') !== false) {
return true;
}
return false;
}
?>