我正在尝试创建一个脚本,该脚本从sitemap.xml加载url并将其放入数组中。他们应该逐个加载所有页面,然后在每个页面之后打印一些东西。
<?php
set_time_limit(6000);
$urls = array();
$DomDocument = new DOMDocument();
$DomDocument->preserveWhiteSpace = false;
$DomDocument->load('sitemap.xml');
$DomNodeList = $DomDocument->getElementsByTagName('loc');
//parsovani xml, vkladani linku do pole
foreach($DomNodeList as $url) {
$urls[] = $url->nodeValue;
}
foreach ($urls as $url) {
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($ch);
echo $url."<br />";
flush();
ob_flush();
}
?>
仍然无法运作。加载时间很长,不打印任何东西。我认为同花顺不起作用。
有人看到了问题吗?
非常感谢你 菲利普
答案 0 :(得分:0)
我会运行这样的东西
<?php
set_time_limit(6000);
$urls = array();
$DomDocument = new DOMDocument();
$DomDocument->preserveWhiteSpace = false;
$DomDocument->load('sitemap.xml');
$DomNodeList = $DomDocument->getElementsByTagName('loc');
foreach($DomNodeList as $url) {
$urls[] = $url->nodeValue;
}
foreach ($urls as $url) {
$data = file_get_contents($url);
echo $url."<br />". $data;
}
?>
甚至更好,而不是2个循环。
<?php
set_time_limit(6000);
$urls = array();
$DomDocument = new DOMDocument();
$DomDocument->preserveWhiteSpace = false;
$DomDocument->load('sitemap.xml');
$DomNodeList = $DomDocument->getElementsByTagName('loc');
foreach($DomNodeList as $url) {
$curURL = $url->nodeValue;
$urls[] = $curURL;
$data = file_get_contents($curURL);
echo $curURL."<br />". $data;
}
?>