Question

我想从其中没有特定单词的页面源中提取mp3网址。

这是我用来搜索mp3网址的正则表达式：

https?:\/\/.+\.mp3

可以。现在，我要排除其中包含特定单词的那些网址。因此，我需要其中没有特定字词的网址。

如何排除http和.mp3之间的单词？

我将在Qt和C ++中使用它，但是只要它与https://regex101.com/一起使用就可以了。

Answer 1

如果您要“排除那些中没有特定单词的网址”，则可以对该单词使用正向超前查询（前面带有一些字符），例如

(?=.*Sing)

在Java语言中：

const word = 'Sing';
const urls = ['http://I_like_to_sing.mp3', 'http://Another_song.mp3'];
let regex = new RegExp('https?:\/\/(?=.*' + word + ').+\.mp3', 'i');
console.log(urls.filter(v => v.match(regex)));

在PHP

$word = 'Sing';
$urls = ['http://I_like_to_sing.mp3', 'http://Another_song.mp3'];
$regex = "/https?:\/\/(?=.*$word).+\.mp3/i";
print_r(array_filter($urls, function ($v) use ($regex) { return preg_match($regex, $v); }));

输出：

Array ( 
    [0] => http://I_like_to_sing.mp3 
)

Demo on 3v4l.org

更新

要排除其中包含特定单词的那些URL，可以使用否定的超前查询，例如

(?![^.]*Sing)

我们使用[^.]来确保单词出现在.mp3部分的之前。这是一个PHP演示：

$word = 'Song';
$string = "some words http://I_like_to_sing.mp3 and then some other words http://Another_song.mp3 and some words at the end...";
$regex = "/(https?:\/\/(?![^.]*$word).+?\.mp3)/i";
preg_match_all($regex, $string, $matches);
print_r($matches[1]);

输出：

Array ( 
    [0] => http://I_like_to_sing.mp3
)

Demo on 3v4l.org

Answer 2

我希望这是一个有用的答案。

这是一个正则表达式，在 python3 上具有用例。因此，如果您想在 http 和 .mp3 之间排除“单词”，则可以这样做。

import re

ref = "http://www.some_undesired_text_018/m102/1-225x338.mp3"

_del = re.findall(r'https?(.+)\.mp3', ref)[0]

out = ref.replace(_del, "")

#_del will contain the undesired word

使用正则表达式查找没有特定单词的mp3 URL

2 个答案: