匹配特定网址的正则表达式模式

时间:2015-04-27 12:20:57

标签: java regex

我有一个大文本,我只想使用它的某些信息。文字如下:

Some random text here
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8
More random text here
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_1_av.m3u8
More random text here
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_2_av.m3u8
More random text here
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_3_av.m3u8

我只想要http文本。文中有几个,但我只需要其中一个。正则表达式应为"以http开头,以.m3u8"结尾。

我查看了所有不同表达的词汇表,但这对我来说非常混乱。我尝试"/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{12,30})([\/\w \.-]*)*\/?$/"作为我的模式。但那够了吗?

感谢所有帮助。谢谢。

3 个答案:

答案 0 :(得分:1)

假设您的文字在示例中的每一行代表处都以行分隔,这里有一个可行的代码段:

String text = 
"Some random text here" +
System.getProperty("line.separator") +
"http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8" +
System.getProperty("line.separator") +
"More random text here" +
System.getProperty("line.separator") +
"http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8" +
System.getProperty("line.separator") +
// removed some for brevity
"More random text here" +
System.getProperty("line.separator") +
// added counter-example ending with "NOPE"
"http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.NOPE";

// Multi-line pattern:
//                           ┌ line starts with http
//                           |    ┌ any 1+ character reluctantly quantified
//                           |    |  ┌ dot escape
//                           |    |  |  ┌ ending text
//                           |    |  |  |   ┌ end of line marker
//                           |    |  |  |   |
Pattern p = Pattern.compile("^http.+?\\.m3u8$", Pattern.MULTILINE);
Matcher m = p.matcher(text);
while (m.find()) {
    System.out.println(m.group());
}

<强>输出

http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8

修改

对于精致的&#34;过滤器&#34;通过URL的"index_x"文件,您只需将其添加到协议和行尾之间的Pattern中,例如:

Pattern.compile("^http.+?index_0.+?\\.m3u8$", Pattern.MULTILINE);

答案 1 :(得分:0)

我没有测试它,但这应该可以解决问题:

^(http:\/\/.*\.m3u8)

答案 2 :(得分:0)

这是@capnibishop的答案,但稍有改动。

^(http://).*(/index_1)[^/]*\.m3u8$

添加了缺失的&#34; $&#34;最后的标志。这确保它匹配

http://something.m3u8

而不是

http://something.m3u81

在该行的末尾添加了匹配 index_1 的条件,这意味着它将匹配:

http://something/index_1_something_else.m3u8

而不是

http://something/index_1/something_else.m3u8