Question

我需要来自同一网站的很多页面的复制链接。看起来像：/download.php？id = xxxxx 只需要在id中添加1个以获得所需的页面... 在这些页面上，我需要在代码中加入一个链接，如： HREF = “http://www.site.com/xxxxxxxxxxxx” （x作为变量）

有可能吗？感谢

Answer 1

请勿使用REGEX解析HTML

当尝试从网页获取URL或链接文本时，人们犯的最大错误可能是尝试使用正则表达式。可以使用正则表达式完成作业，但是，在整个文档上多次使用preg循环会产生很高的开销。正确的方法，以及更快，更无限的方式是使用DOM。通过在getLinks函数中使用DOM，可以很容易地创建一个数组，其中包含网页上的所有链接作为键，链接名称作为值。然后可以像创建任何数组和列表一样循环该数组，或以任何所需的方式操作。请注意，加载HTML时会使用错误抑制。这是为了禁止有关DOCTYPE中未定义的无效HTML实体的警告。但是，当然，在生产环境中，将禁用错误报告，并将错误报告设置为无。

<?php
    function getLinks($link){
        $ret = array();

        /*** a new dom object ***/
        $dom = new domDocument;

        /*** get the HTML via FGC, 
        Tho prefer using cURL instead but that's out of scope of the question..
       (@suppress those errors) ***/
        @$dom->loadHTML(file_get_contents($link));

        /*** remove silly white space ***/
        $dom->preserveWhiteSpace = false;

        /*** get the links from the HTML ***/
        $links = $dom->getElementsByTagName('a');

        /*** loop over the links ***/
        foreach ($links as $tag){
            /*** only add download links to the return array ***/
            if(strpos($tag->getAttribute('href'),'/download.php?id=')!=false){
                 $ret[$tag->getAttribute('href')] = $tag->childNodes->item(0)->nodeValue;
            }
        }
        return $ret;
    }
?>

使用示例

<?php
    /*** a link to search ***/
    $link = "http://www.site.com";

    /*** get the links ***/
    $urls = getLinks($link);

    /*** check for results ***/
    if(sizeof($urls) > 0){
        foreach($urls as $key=>$value){
            echo $key . ' - '. $value . ' - ' . str_ireplace('http://www.site.com/download.php?id=','',$key). '<br >';
        }
    }else{
        echo "No links found at $link";
    }
?>

用于从许多页面复制链接的脚本

1 个答案: