我正在研究一个简单的应用程序,该应用程序扫描网站的数组,我想做的是将网址保存在数组中,然后将其放在另一个数组中,我的问题仅是数组中第一个域的结果正在显示(很抱歉,我的观察先前是错误的)。
<?php
$arrDomains = array('http://example1.com/', 'http://example2.com/');
$arrExternals = array();
for($i = 0; $i < count($arrDomains); $i++){
$domain = test_input($arrDomains[$i]);
$domain = filter_var($domain, FILTER_SANITIZE_URL);
// START HERE
$html = file_get_contents($domain);
$dom = new DOMDocument();
@$dom->loadHTML($html);
// grab all the on the page
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");
$external = array();
for ($i = 0; $i < $hrefs->length; $i++) {
$href = $hrefs->item($i);
$url = $href->getAttribute('href');
if (filter_var($url, FILTER_VALIDATE_URL) !== false) {
if (strpos($url, 'mailto') === false) { // exclude emails
if (!in_array($url, $external)) {
array_push($external, $url);
}
}
}
}
array_push($arrExternals, $external);
}
?>
答案 0 :(得分:1)
您需要更改变量$ i,因为它会在第一个for循环中覆盖$ i。我将一个$ i更改为$ j:
$arrDomains = array('http://example1.com/', 'http://example2.com/');
$arrExternals = array();
for($i = 0; $i < count($arrDomains); $i++){
$domain = test_input($arrDomains[$i]);
$domain = filter_var($domain, FILTER_SANITIZE_URL);
// START HERE
$html = file_get_contents($domain);
$dom = new DOMDocument();
@$dom->loadHTML($html);
// grab all the on the page
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");
$external = array();
for ($j = 0; $j < $hrefs->length; $j++) {
$href = $hrefs->item($j);
$url = $href->getAttribute('href');
if (filter_var($url, FILTER_VALIDATE_URL) !== false) {
if (strpos($url, 'mailto') === false) { // exclude emails
if (!in_array($url, $external)) {
array_push($external, $url);
}
}
}
}
array_push($arrExternals, $external);
}