Question

我实际上写了一个正则表达式来搜索文本中的Web URL（下面的完整代码）但是在运行代码时，控制台只打印出文本中的最后一个URL。我不知道什么是错的，我实际上使用 while 循环。请参阅下面的代码，并帮助进行更正。感谢

import java.util.*;
import java.util.regex.*;

public class Main
{
    static String query = "This is a URL http://facebook.com" 
    + " and this is another, http://twitter.com "
    + "this is the last URL http://instagram.com"
    + " all these URLs should be printed after the code execution";

    public static void main(String args[])
    {
        String pattern = "([\\w \\W]*)((http://)([\\w \\W]+)(.com))";
        Pattern p = Pattern.compile(pattern);
        Matcher m = p.matcher(query);

        while(m.find())
        {
             System.out.println(m.group(2));
        }
    }
}

在运行上面的代码时，只有http://instagram.com被打印到控制台输出

Answer 1

我找到了另一个RegEx here

https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)

它会查找https，但在您的情况下似乎有效。

我正在使用此代码打印所有3个网址：

public class Main {

static String query = "This is a URL http://facebook.com"
        + " and this is another, http://twitter.com "
        + "this is the last URL http://instagram.com"
        + " all these URLs should be printed after the code execution";

public static void main(String[] args) {
    String pattern = "https?:\\/\\/(www\\.)?[-a-zA-Z0-9@:%._\\+~#=]{2,256}\\.[a-z]{2,6}\\b([-a-zA-Z0-9@:%_\\+.~#?&//=]*)";
    Pattern p = Pattern.compile(pattern);
    Matcher m = p.matcher(query);

    while (m.find()) {
        System.out.println(m.group());
    }
  }
}

Answer 2

我不确定这种模式有多可靠，但是当我运行你的例子时它打印出所有的URL。

(http://[A-Za-z0-9]+\\.[a-zA-Z]{2,3})

如果您遇到如下所示的网址，则必须对其进行修改：

http://www.instagram.com

因为它只捕获没有'www'的网址。

Answer 3

您的问题是您的正则表达式量词（即*和+字符）是贪婪的，这意味着它们尽可能匹配。您需要使用reluctant quantifiers。请参阅下面的更正代码模式 - 只需两个额外字符 - ?之后的*字符和+ 匹配尽可能少。

String pattern = "([\\w \\W]*?)((http://)([\\w \\W]+?)(.com))";

Answer 4

也许你正在寻找这个正则表达式：

SELECT
    c.id,
    (COUNT(cj1.id)+COUNT(co1.id)) AS count_contracts_all,
    (COUNT(cj2.id)+COUNT(co2.id)) AS count_contracts_active
FROM
    customers c
    LEFT OUTER JOIN contracts_jewels cj1 ON c.id = cj1.customer_id
    LEFT OUTER JOIN contracts_objects co1 ON c.id = co1.customer_id
    LEFT OUTER JOIN contracts_jewels cj2 ON
        c.id = cj2.id AND
        cj2.final_date >= NOW() AND
        cj2.paid = 0 AND
        cj2.transferred = 0
    LEFT OUTER JOIN contracts_object co2 ON
        c.id = co2.id AND
        co2.final_date >= NOW() AND
        co2.paid = 0 AND
        co2.transferred = 0
GROUP BY c.id

例如，从这个字符串：

http://(\w+(?:\.\w+)+)

它提取

http://ww1.amazon.com and http://npr.org

要打破它的运作方式：

"ww1.amazon.com"
"npr.org"

希望这有帮助。

Answer 5

我希望这会为您清除，但是您匹配的字符太多，您的匹配应该尽可能限制性，因为regex 贪婪并且我会尽可能地匹配。

这是我对你的代码的看法：

public class Main {


static String query = "This is a URL http://facebook.com"
                + " and this is another, http://twitter.com "
                + "this is the last URL http://instagram.com"
                + " all these URLs should be printed after the code execution";
public static void main(String args[]) {
        String pattern = "(http:[/][/][Ww.]*[a-zA-Z]+.com)";
        Pattern p = Pattern.compile(pattern);
        Matcher m = p.matcher(query);

        while(m.find())
        {
            System.out.println(m.group(1));
        }
}

}

如果您希望匹配更多需要根据您的需要进行调整，则上述cote将仅匹配您的示例。

实现测试模式的好方法是http://www.regexpal.com/您可以在那里发布您的模式，以便完全符合您的要求，只需记住在Java中将\替换为双\\以进行转义caracters。

我的正则表达式搜索仅打印出最后一场比赛

5 个答案: