Question

所以我有下一个代码来过滤掉页面源（String text）

中的所有url（只是http）

private synchronized void addLinks(String text) {

    String regex = "\\b(http)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";

    Pattern urlPattern = Pattern.compile(regex);

    Matcher matcher = urlPattern.matcher(text);
    while(matcher.find()) {

        int matchStart = matcher.start(1);
        int matchEnd = matcher.end();
        String urlStr = text.substring(matchStart, matchEnd);

        //do something
        }
    }
}

我需要在正则表达式中添加一些代码，以便仅匹配链接到某些文本页面的网址。有可能吗？

Answer 1

public class NewC{
public static void main(String[] args) throws URISyntaxException {
   String URL_REGEX ="\\b((?:https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|].[^jpg][^png][^gif]$)";

    Pattern p = Pattern.compile(URL_REGEX);
    Matcher m = p.matcher(args[0]);//replace with string to compare
    if(m.find()) {//myw3schoolsimage
        System.out.println("String contains URL");
    }
}

}

正则表达式找到所有网址，包括png，jpg，gif

1 个答案: