Question

我有一个大文本，我只想使用它的某些信息。文字如下：

Some random text here
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8
More random text here
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_1_av.m3u8
More random text here
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_2_av.m3u8
More random text here
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_3_av.m3u8

我只想要http文本。文中有几个，但我只需要其中一个。正则表达式应为＆＃34;以http开头，以.m3u8＆＃34;结尾。

我查看了所有不同表达的词汇表，但这对我来说非常混乱。我尝试"/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{12,30})([\/\w \.-]*)*\/?$/"作为我的模式。但那够了吗？

感谢所有帮助。谢谢。

Answer 1

假设您的文字在示例中的每一行代表处都以行分隔，这里有一个可行的代码段：

String text = 
"Some random text here" +
System.getProperty("line.separator") +
"http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8" +
System.getProperty("line.separator") +
"More random text here" +
System.getProperty("line.separator") +
"http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8" +
System.getProperty("line.separator") +
// removed some for brevity
"More random text here" +
System.getProperty("line.separator") +
// added counter-example ending with "NOPE"
"http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.NOPE";

// Multi-line pattern:
//                           ┌ line starts with http
//                           |    ┌ any 1+ character reluctantly quantified
//                           |    |  ┌ dot escape
//                           |    |  |  ┌ ending text
//                           |    |  |  |   ┌ end of line marker
//                           |    |  |  |   |
Pattern p = Pattern.compile("^http.+?\\.m3u8$", Pattern.MULTILINE);
Matcher m = p.matcher(text);
while (m.find()) {
    System.out.println(m.group());
}

<强>输出

http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8

修改

对于精致的＆＃34;过滤器＆＃34;通过URL的"index_x"文件，您只需将其添加到协议和行尾之间的Pattern中，例如：

Pattern.compile("^http.+?index_0.+?\\.m3u8$", Pattern.MULTILINE);

Answer 2

我没有测试它，但这应该可以解决问题：

^(http:\/\/.*\.m3u8)

Answer 3

这是@capnibishop的答案，但稍有改动。

^(http://).*(/index_1)[^/]*\.m3u8$

添加了缺失的＆＃34; $＆＃34;最后的标志。这确保它匹配

http://something.m3u8

而不是

http://something.m3u81

在该行的末尾添加了匹配 index_1 的条件，这意味着它将匹配：

http://something/index_1_something_else.m3u8

而不是

http://something/index_1/something_else.m3u8

匹配特定网址的正则表达式模式

3 个答案: