如何用正则表达式过滤长字符串(动态)?

时间:2015-04-28 10:13:37

标签: java regex

我已将Web应用程序的响应存储在字符串中。该字符串包含多个URL:s,它是动态的。可以是10-1000 URL:s。

我从事性能工程,但这次我必须在java中编写插件代码,而且我远不是编程方面的专家。

我遇到的问题是,在我的回复字符串中,我有很多我不需要的胡言乱语,而且我不知道如何过滤掉它。在我的打印/请求中,我只想发送URL。

我到目前为止:

responseData = "http://xxxx-f.akamaihd.net/i/world/open/20150426/1370235-005A/EPISOD-65354-005A-016f1729028090bf_,892,144,252,360,540,1584,2700,.mp4.csmil/segment1_4_av.ts?null=" +
                "#EXTINF:10.000, " + 
                "http://xxxxx-f.akamaihd.net/i/world/open/20150426/1370235-005A/EPISOD-65365-005A-016f1729028090bf_,892,144,252,360,540,1584,2700,.mp4.csmil/segment2_4_av.ts?null=" + 
                "#EXTINF:fgsgsmoregiberish, " + 
                "http://xxxx-f.akamaihd.net/i/world/open/20150426/1370235-005A/EPISOD-6353-005A-016f1729028090bf_,892,144,252,360,540,1584,2700,.mp4.csmil/segment2_4_av.ts?null=";


            pattern = "^(http://.*\\.ts)";



             pr = Pattern.compile(pattern); 

             math = pr.matcher(responseData);


            if (math.find()) {
                System.out.println(math.group());


// in this print, I get everything from the response. I only want the URLS (dynamic. could be different names, but they all start with http and end with .ts). 
            }
            else {
                System.out.println("No Math");
            }

3 个答案:

答案 0 :(得分:2)

根据您的网址的外观,您可以使用适用于您的示例的这种天真模式,并在?之前停止(以java风格编写):

\\bhttps?://[^?\\s]+

确保最后有.ts,您可以将其更改为:

\\bhttps?://[^?\\s]+\\.ts

\\bhttps?://[^?\\s]+\\.ts(?=[\\s?]|\\z)

检查是否到达了路径的末尾。

请注意,这些模式不会处理包含双引号之间空格的网址。

答案 1 :(得分:0)

使用以下正则表达式模式:

(((http|ftp|https):\/{2})+(([0-9a-z_-]+\.)+([a-z]{2,4})(:[0-9]+)?((\/([~0-9a-zA-Z\#\+\%@\.\/_-]+))?(\?[0-9a-zA-Z\+\%@\/&\[\];=_-]+)?)?))\b

说明:

  • 包含http或https或ftp,其中包含//:((http|ftp|https):\/{2})
  • 现在添加' +'签署以在同一个字符串中添加下一部分
  • 一个网址名称。 :([0-9a-z _-] +。)
  • 域名:([a-z] {2,4})
  • 任何数字都没有或一次出现(这里?表示非一次或一次):(:[0-9] +)?
  • 休息网址非一次或一次:'(/([~0-9a-zA-Z#+ \%@。/ _-] +))?(\?[0-9a-zA -Z + \%@ /&安培; []; = _-] +))'?

答案 2 :(得分:0)

只需使用.*?而不是贪婪的.*,使你的正则表达式懒惰,即:

pr = Pattern.compile("(https?.*?\\.ts)");

正则表达式演示:

https://regex101.com/r/nQ5pA7/1

正则表达式解释:

(https?.*?\.ts)

Match the regex below and capture its match into backreference number 1 «(https?.*?\.ts)»
   Match the character string “http” literally (case sensitive) «http»
   Match the character “s” literally (case sensitive) «s?»
      Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
   Match any single character that is NOT a line break character (line feed, carriage return, next line, line separator, paragraph separator) «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
   Match the character “.” literally «\.»
   Match the character string “ts” literally (case sensitive) «ts»