Question

我正在解析HTML页面并抓取链接：

function get_links($url) { 

    // Create a new DOM Document to hold our webpage structure 
    $xml = new DOMDocument(); 

    // Load the url's contents into the DOM 
    $xml->loadHTMLFile($url); 

    // Empty array to hold all links to return 
    $links = array(); 

    //Loop through each <a> tag in the dom and add it to the link array 
    foreach($xml->getElementsByTagName('a') as $link) { 
        $links[] = array('url' => $link->getAttribute('href')); 
    } 

    //Return the links 
    return $links; 
} 

$arrayLinks = get_links($url);

我面临的唯一问题是某些链接没有完全格式化：

/image/1.jpg

，或者

//example.com

这正是他们返回并放入我的阵列的方式。有没有办法在PHP中提取这些链接，以便返回FULL URL？关于上述例子;

而不是/image/1.jpg，它将是https://example.com/image/1.jpg。

而不是//example.com，它将是https://example.com。

注意：我知道在javascript中这可以使用element.href来完成，但PHP中是否有任何内容，最好是我可以使用上面提到的示例？

PHP相当于Javascripts的element.href

0 个答案: