考虑以下php代码,它为客户的电子邮件抓取客户旧的静态网站。
$urls = explode(PHP_EOL, file_get_contents('urls.txt'));
print '<pre>'; print_r($urls); print '</pre>';
print '<strong>Results:</strong><br>';
function get_emails($url) {
$html = file_get_contents($url);
$dom = new DOMDocument;
@$dom->loadHTML($html);
$links = $dom->getElementsByTagName('a');
foreach ($links as $link){
$href = $link->getAttribute('href');
if (strpos($href, 'mailto') !== false) {
echo str_replace("mailto:","",$href) . '<br>';
}
}
}
foreach ($urls as $key => $url) {
get_emails($url);
}
我正在从urls.txt中读取网址列表,但结果只是文件中最后一个网址之一。所有其他人都被忽略了。我原本希望它会返回一个很好的所有客户网址列表,以便我们可以将它们导入新网站。
有人可以帮助诊断问题吗?
答案 0 :(得分:1)
这是因为: -
return str_replace("mailto:","",$href) . '<br>';
它将终止循环的执行。
<强> 1。要么: -
$urls = explode(PHP_EOL, file_get_contents('urls.txt'));
print '<pre>'; print_r($urls); print '</pre>';
print '<strong>Results:</strong><br>';
function get_emails($url) {
$html = file_get_contents($url);
$dom = new DOMDocument;
@$dom->loadHTML($html);
$links = $dom->getElementsByTagName('a');
foreach ($links as $link){
$href = $link->getAttribute('href');
echo str_replace("mailto:","",$href) . '<br>';
}
}
foreach ($urls as $key => $url) {
get_emails($url);
}
<强> 2。或者如下所示: -
$urls = explode(PHP_EOL, file_get_contents('urls.txt'));
print '<pre>'; print_r($urls); print '</pre>';
print '<strong>Results:</strong><br>';
function get_emails($url) {
$html = file_get_contents($url);
$data = array(); //define array
$dom = new DOMDocument;
@$dom->loadHTML($html);
$links = $dom->getElementsByTagName('a');
foreach ($links as $link){
$href = $link->getAttribute('href');
$data[] = str_replace("mailto:","",$href) . '<br>'; //assign each value to the array
}
return $data;
}
foreach ($urls as $key => $url) {
print_r(get_emails($url));
}