在while循环中使用正则表达式函数

时间:2015-10-23 01:49:04

标签: php regex hyperlink while-loop preg-match

我有一个从特定网站获取特定链接的函数,它可以工作,但是当我尝试在while循环中使用此函数时问题就开始了。当我尝试这样做时,由于某种原因,链接长度开始叠加。

function getLinks($link) {

$link1 = $link;
$content = file_get_contents($link1);

$content = str_replace("<", "", $content);
$content = str_replace(">", "", $content);

preg_match("~previous page.+?next page~i", $content, $match);
preg_match("~\"(/.+?)\"~i", $match[0], $match);
$link2 = "https://en.wiktionary.org".$match[1];

echo $link1."<br>";
echo $link2."<br>";

return $link2;

}


$firstLink = getLinks("https://en.wiktionary.org/w/index.php?title=Category:English_verbs&pagefrom=AUTOPILOT%0Aautopilot#mw-pages");

结果firstLink = getLinks():

https://en.wiktionary.org/w/index.php?title=Category:English_verbs&pagefrom=AUTOPILOT%0Aautopilot#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&pagefrom=BAGSIE%0Abagsie#mw-pages

^ ---看看它是如何工作的,就像这样?然后当我把它放入while循环时:

$count = 0; 
while ($count < 5) {

$count++;
$firstLink = getLinks($firstLink);

}

结果完全搞砸了,链接开始堆叠在一起,如下:

https://en.wiktionary.org/w/index.php?title=Category:English_verbs&pagefrom=AUTOPILOT%0Aautopilot#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&pagefrom=BAGSIE%0Abagsie#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&pagefrom=BAGSIE%0Abagsie#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&amp%3Bpagefrom=BAGSIE%0Abagsie&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&amp%3Bpagefrom=BAGSIE%0Abagsie&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&amp%3Bamp%3Bpagefrom=BAGSIE%0Abagsie&amp%3Bpagefrom=ACETIFY%0Aacetify&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&amp%3Bamp%3Bpagefrom=BAGSIE%0Abagsie&amp%3Bpagefrom=ACETIFY%0Aacetify&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&amp%3Bamp%3Bamp%3Bpagefrom=BAGSIE%0Abagsie&amp%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&amp%3Bpagefrom=ACETIFY%0Aacetify&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&amp%3Bamp%3Bamp%3Bpagefrom=BAGSIE%0Abagsie&amp%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&amp%3Bpagefrom=ACETIFY%0Aacetify&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&amp%3Bamp%3Bamp%3Bamp%3Bpagefrom=BAGSIE%0Abagsie&amp%3Bamp%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&amp%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&amp%3Bpagefrom=ACETIFY%0Aacetify&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&amp%3Bamp%3Bamp%3Bamp%3Bpagefrom=BAGSIE%0Abagsie&amp%3Bamp%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&amp%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&amp%3Bpagefrom=ACETIFY%0Aacetify&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bpagefrom=BAGSIE%0Abagsie&amp%3Bamp%3Bamp%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&amp%3Bamp%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&amp%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&amp%3Bpagefrom=ACETIFY%0Aacetify&pagefrom=ACETIFY%0Aacetify#mw-pages

这让我疯了,所以如果有人知道我做错了什么,请告诉我。谢谢。

while循环中的常规函数​​:

function addOne($num) {

echo $num."<br>";   
$num++;
return $num;    

}

$num = 0;
$count = 0;
while ($count < 5) {

$count++;
$num = addOne($num);    

}

^ ---工作得很好

1 个答案:

答案 0 :(得分:1)

您的问题在于HTML实体。我重新编写了解决该问题的函数,重复了URL并使其更有效率。您可以使用深度参数调用它,在您的情况下,这将是您最长的时间

function getLinks($linkd, $depth, $checked=array()) {

if(!is_array($linkd)) $linkd=array($linkd);
    foreach($linkd as $link)
    {
        if(isset($checked[$link])) continue;
        $link1 = $link;
        $content = file_get_contents($link1);

        $content = str_replace("<", "", $content);
        $content = str_replace(">", "", $content);

        preg_match("~previous page.+?next page~i", $content, $match);
        preg_match("~\"(/.+?)\"~i", $match[0], $match);
        $link2 = "https://en.wiktionary.org".$match[1];

        echo $link1."<br>";
        echo $link2."<br>";

        $checked[$link] = true;

        if($depth>0)
        {
            $depth--;
            return getLinks(html_entity_decode($link2), $depth, $checked);
        }
        else
        {
            return $link2;
        }

    }
}


$firstLink = "https://en.wiktionary.org/w/index.php?title=Category:English_verbs&pagefrom=AUTOPILOT%0Aautopilot#mw-pages";

$firstLink = getLinks($firstLink, 5);