使用preg_match_all从String中提取所有MP3和OGG链接

时间:2014-05-29 18:40:34

标签: php regex preg-match-all

我试图创建一个正则表达式来从示例单词中提取所有MP3 / OGG链接,但我不能!这是一个示例词,我正试图从中提取MP3 / OGG文件:

this is a example word http://domain.com/sample.mp3 and second file is https://www.mydomain.com/sample2.ogg. then this is a link for third file <a href="http://seconddomain.com/files/music.mp3" target="_blank">Download</a>

和PHP部分:

$Word = "this is a example word http://domain.com/sample.mp3 and second file is https://www.mydomain.com/sample2.ogg. then this is a link for third file <a href="http://seconddomain.com/files/music.mp3" target="_blank">Download</a>";


$Pattern = '/href=\"(.*?)\".mp3/';
preg_match_all($Pattern,$Word,$Matches);
print_r($Matches);

我也尝试了这个:

$Pattern = '/href="([^"]\.mp3|ogg)"/';
$Pattern = '/([-a-z0-9_\/:.]+\.(mp3|ogg))/i';

所以我需要你的帮助来修复这段代码并从该示例单词中提取所有MP3 / OGG链接。

谢谢你们。

2 个答案:

答案 0 :(得分:1)

  

..从该示例单词中提取所有MP3 / OGG链接。

e.g:

(?<=https?://(.+)?)\.(mp3|ogg)
  • $ 1 - uri
  • $ 2 - 扩展名

<强>更新

:(是的,在PHP (v5.5测试)上搜索:

(?<=https?://(.+)?)\.(mp3|ogg)

有限制:

  • 编译失败:lookbehind断言在偏移n处固定长度

所以,类似的变种:

  • (?<=p1(.+)?)p2 - 如果在
  • 之前匹配p1,则匹配p2
  • p2(?=(.+)p3) - 匹配p2如果匹配p3之后 - 所有工作都没有固定长度〜。+? for PHP

您的样本:

//p2(?=.*p3)
preg_match_all("#https?://(?=(.+?)\.(mp3|ogg))#im", $Word, $Matches);

/*
[0] => Array
    (
        [0] => http://
        [1] => https://
        [2] => http://
    )

[1] => Array
    (
        [0] => domain.com/sample
        [1] => www.mydomain.com/sample2
        [2] => seconddomain.com/files/music
    )

[2] => Array
    (
        [0] => mp3
        [1] => ogg
        [2] => mp3
    )
 */

//p2(?=.*p3) preg_match_all("#https?://(?=(.+?)\.(mp3|ogg))#im", $Word, $Matches); /* [0] => Array ( [0] => http:// [1] => https:// [2] => http:// ) [1] => Array ( [0] => domain.com/sample [1] => www.mydomain.com/sample2 [2] => seconddomain.com/files/music ) [2] => Array ( [0] => mp3 [1] => ogg [2] => mp3 ) */

答案 1 :(得分:1)

要检索所有链接,您可以使用:

((https?:\/\/)?(\w+?\.)+?(\w+?\/)+\w+?.(mp3|ogg))

Demo

((https?:\/\/)?可选http://https://

(\w+?\.)+?匹配域组

(\w+?\/)+匹配最终域组并转发斜杠

\w+?.(mp3|ogg))匹配以.mp3.ogg结尾的文件名。

在您提供的字符串中,有几个未转义的引号,经过更正并添加了我的正​​则表达式:

$Word = "this is a example word http://domain.com/sample.mp3 and second file is https://www.mydomain.com/sample2.ogg. then this is a link for third file <a href=\"http://seconddomain.com/files/music.mp3\" target=\"_blank\">Download</a>";

$Pattern = '/((https?:\/\/)?(\w+?\.)+?(\w+?\/)+\w+?.(mp3|ogg))/im';
preg_match_all($Pattern,$Word,$Matches);
var_dump($Matches[0]);

产生以下输出:

array (size=3)
  0 => string 'http://domain.com/sample.mp3' (length=28)
  1 => string 'https://www.mydomain.com/sample2.ogg' (length=36)
  2 => string 'http://seconddomain.com/files/music.mp3' (length=39)