Question

我一直使用preg_match来抓取HTML文件中的网址，但我想只提取.mp3作为其扩展名的网址。我被告知要尝试DOM，我一直在尝试修复代码，但它不起作用。无论我做什么，我都会得到一个空白页。

我做错了什么？

<?php
    $url = 'http://www.mp3olimp.net/miley-cyrus-when-i-look-at-you/';
    $html = @file_get_html($url);
    $dom = new DOMDocument();
    $doc->loadHTML($html);
    $xpath = new DOMXPath($doc); 
    $links = $xpath->query('//a[ends-with(@href, ".mp3")]/@href');

    echo $links;
?>

Answer 1

有几个问题！

如上所述，请在@之前移除file_get_html()以查看错误。
file_get_contents($url)将努力获取HTML内容。
错字，$dom =应为$doc =
另一个令人讨厌的问题是，HTML源代码格式错误，导致以后出现错误。
ends-with()仅在XPath 2.0中受支持，PHP使用XPath 1.0。所以你必须找到另一种检查结局的方法。一些正则表达式应该可以解决这个问题。

Answer 2

$input = file_get_contents($url);    
$regexp = "<a\s[^>]*href=(\"??)([^\" >]*?.mp3)\\1[^>]*>(.*)<\/a>";
if(preg_match_all("/$regexp/siU", $input, $matches, PREG_SET_ORDER)) {
  foreach($matches as $match) {
    // $match[2] = link address
    // $match[3] = link text
  }
}

从HTML抓取链接

2 个答案: