我有一个从特定网站获取特定链接的函数,它可以工作,但是当我尝试在while循环中使用此函数时问题就开始了。当我尝试这样做时,由于某种原因,链接长度开始叠加。
function getLinks($link) {
$link1 = $link;
$content = file_get_contents($link1);
$content = str_replace("<", "", $content);
$content = str_replace(">", "", $content);
preg_match("~previous page.+?next page~i", $content, $match);
preg_match("~\"(/.+?)\"~i", $match[0], $match);
$link2 = "https://en.wiktionary.org".$match[1];
echo $link1."<br>";
echo $link2."<br>";
return $link2;
}
$firstLink = getLinks("https://en.wiktionary.org/w/index.php?title=Category:English_verbs&pagefrom=AUTOPILOT%0Aautopilot#mw-pages");
结果firstLink = getLinks():
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&pagefrom=AUTOPILOT%0Aautopilot#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&pagefrom=BAGSIE%0Abagsie#mw-pages
^ ---看看它是如何工作的,就像这样?然后当我把它放入while循环时:
$count = 0;
while ($count < 5) {
$count++;
$firstLink = getLinks($firstLink);
}
结果完全搞砸了,链接开始堆叠在一起,如下:
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&pagefrom=AUTOPILOT%0Aautopilot#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&pagefrom=BAGSIE%0Abagsie#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&pagefrom=BAGSIE%0Abagsie#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&%3Bpagefrom=BAGSIE%0Abagsie&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&%3Bpagefrom=BAGSIE%0Abagsie&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&%3Bamp%3Bpagefrom=BAGSIE%0Abagsie&%3Bpagefrom=ACETIFY%0Aacetify&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&%3Bamp%3Bpagefrom=BAGSIE%0Abagsie&%3Bpagefrom=ACETIFY%0Aacetify&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&%3Bamp%3Bamp%3Bpagefrom=BAGSIE%0Abagsie&%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&%3Bpagefrom=ACETIFY%0Aacetify&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&%3Bamp%3Bamp%3Bpagefrom=BAGSIE%0Abagsie&%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&%3Bpagefrom=ACETIFY%0Aacetify&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&%3Bamp%3Bamp%3Bamp%3Bpagefrom=BAGSIE%0Abagsie&%3Bamp%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&%3Bpagefrom=ACETIFY%0Aacetify&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&%3Bamp%3Bamp%3Bamp%3Bpagefrom=BAGSIE%0Abagsie&%3Bamp%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&%3Bpagefrom=ACETIFY%0Aacetify&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&%3Bamp%3Bamp%3Bamp%3Bamp%3Bpagefrom=BAGSIE%0Abagsie&%3Bamp%3Bamp%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&%3Bamp%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&%3Bpagefrom=ACETIFY%0Aacetify&pagefrom=ACETIFY%0Aacetify#mw-pages
这让我疯了,所以如果有人知道我做错了什么,请告诉我。谢谢。
while循环中的常规函数:
function addOne($num) {
echo $num."<br>";
$num++;
return $num;
}
$num = 0;
$count = 0;
while ($count < 5) {
$count++;
$num = addOne($num);
}
^ ---工作得很好
答案 0 :(得分:1)
您的问题在于HTML实体。我重新编写了解决该问题的函数,重复了URL并使其更有效率。您可以使用深度参数调用它,在您的情况下,这将是您最长的时间
function getLinks($linkd, $depth, $checked=array()) {
if(!is_array($linkd)) $linkd=array($linkd);
foreach($linkd as $link)
{
if(isset($checked[$link])) continue;
$link1 = $link;
$content = file_get_contents($link1);
$content = str_replace("<", "", $content);
$content = str_replace(">", "", $content);
preg_match("~previous page.+?next page~i", $content, $match);
preg_match("~\"(/.+?)\"~i", $match[0], $match);
$link2 = "https://en.wiktionary.org".$match[1];
echo $link1."<br>";
echo $link2."<br>";
$checked[$link] = true;
if($depth>0)
{
$depth--;
return getLinks(html_entity_decode($link2), $depth, $checked);
}
else
{
return $link2;
}
}
}
$firstLink = "https://en.wiktionary.org/w/index.php?title=Category:English_verbs&pagefrom=AUTOPILOT%0Aautopilot#mw-pages";
$firstLink = getLinks($firstLink, 5);