Java URL正则表达式不匹配

时间:2013-03-19 18:52:01

标签: java regex string url

我正在尝试计算Java字符串中的URL数量:

String test = "This http://example.com is a sentence https://secure.whatever.org that contains 2 URLs.";
String urlRegex = "<\\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]>";
int numUrls = 0;
pattern = Pattern.compile(urlRegex);
matcher = pattern.matcher(test);
while(matcher.find())
    numUrls++;
System.err.println("numUrls = " + numUrls);

当我运行它时,它告诉我字符串中有零(不是2)URL。任何想法为什么?提前谢谢!

2 个答案:

答案 0 :(得分:5)

<中的>urlRegex字符导致您的模式与输入test String不匹配。删除它们将按预期生成numUrls 2值。

答案 1 :(得分:0)

试试这段代码:

        String data = "This http://example.com is a sentence https://secure.whatever.org that contains 2 URLs.";

    Pattern pattern = Pattern.compile("[hH][tT]{2}[Pp][sS]?://(\\w+(\\.\\w+?)?)+");
    Matcher matcher = pattern.matcher(data);

    while (matcher.find()) {
        System.out.println(matcher.group());
    }

希望它会奏效。