Question

我想通过在java中使用正则表达式来提取位于其中间的url的一部分这是我试过的，主要是检测java+regex的问题是它在url的最后部分的中间，我不知道如何忽略它后面的字符，我的正则表达式只是在它之前忽略：

   String regex = "https://www\\.google\\.com/(search)?q=([^/]+)/";
String   url = "https://www.google.com/search?q=regex+java&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a";
Pattern pattern = Pattern.compile (regex);
Matcher matcher = pattern.matcher (url);

if (matcher.matches ())
{
    int n = matcher.groupCount ();
    for (int i = 0; i <= n; ++i)
        System.out.println (matcher.group (i));
}
    }

结果应为regex+java甚至regex java。但我的代码没有成功......

Answer 1

尝试：

    String regex = "https://www\\.google\\.com/search\\?q=([^&]+).*";
    String   url = "https://www.google.com/search?q=regex+java&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a";
    Pattern pattern = Pattern.compile (regex);
    Matcher matcher = pattern.matcher (url);

    if (matcher.matches ())
    {
        int n = matcher.groupCount ();
        for (int i = 0; i <= n; ++i)
            System.out.println (matcher.group (i));
    }

结果是：

https://www.google.com/search?q=regex+java&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a
regex+java

修改

打印前更换所有优点：

for (int i = 0; i <= n; ++i) { String str = matcher.group (i).replaceAll("\\+", " "); System.out.println (str); }

Answer 2

String regex = "https://www\\.google\\.com/?(search)\\?q=([^&]+)?";
    String url = "https://www.google.com/search?q=regex+java&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a";

    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(url);

    while (matcher.find()) {

        System.out.println(matcher.group());


    }

这应该可以帮到你。

使用正则表达式提取网址的特定部分

2 个答案: