我有一个大文本,我只想使用它的某些信息。文字如下:
Some random text here
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8
More random text here
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_1_av.m3u8
More random text here
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_2_av.m3u8
More random text here
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_3_av.m3u8
我只想要http文本。文中有几个,但我只需要其中一个。正则表达式应为"以http开头,以.m3u8"结尾。
我查看了所有不同表达的词汇表,但这对我来说非常混乱。我尝试"/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{12,30})([\/\w \.-]*)*\/?$/"
作为我的模式。但那够了吗?
感谢所有帮助。谢谢。
答案 0 :(得分:1)
假设您的文字在示例中的每一行代表处都以行分隔,这里有一个可行的代码段:
String text =
"Some random text here" +
System.getProperty("line.separator") +
"http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8" +
System.getProperty("line.separator") +
"More random text here" +
System.getProperty("line.separator") +
"http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8" +
System.getProperty("line.separator") +
// removed some for brevity
"More random text here" +
System.getProperty("line.separator") +
// added counter-example ending with "NOPE"
"http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.NOPE";
// Multi-line pattern:
// ┌ line starts with http
// | ┌ any 1+ character reluctantly quantified
// | | ┌ dot escape
// | | | ┌ ending text
// | | | | ┌ end of line marker
// | | | | |
Pattern p = Pattern.compile("^http.+?\\.m3u8$", Pattern.MULTILINE);
Matcher m = p.matcher(text);
while (m.find()) {
System.out.println(m.group());
}
<强>输出强>
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8
修改强>
对于精致的&#34;过滤器&#34;通过URL的"index_x"
文件,您只需将其添加到协议和行尾之间的Pattern
中,例如:
Pattern.compile("^http.+?index_0.+?\\.m3u8$", Pattern.MULTILINE);
答案 1 :(得分:0)
我没有测试它,但这应该可以解决问题:
^(http:\/\/.*\.m3u8)
答案 2 :(得分:0)
这是@capnibishop的答案,但稍有改动。
^(http://).*(/index_1)[^/]*\.m3u8$
添加了缺失的&#34; $&#34;最后的标志。这确保它匹配
http://something.m3u8
而不是
http://something.m3u81
在该行的末尾添加了匹配 index_1 的条件,这意味着它将匹配:
http://something/index_1_something_else.m3u8
而不是
http://something/index_1/something_else.m3u8