我实际上写了一个正则表达式来搜索文本中的Web URL(下面的完整代码)但是在运行代码时,控制台只打印出文本中的最后一个URL。我不知道什么是错的,我实际上使用 while 循环。请参阅下面的代码,并帮助进行更正。感谢
import java.util.*;
import java.util.regex.*;
public class Main
{
static String query = "This is a URL http://facebook.com"
+ " and this is another, http://twitter.com "
+ "this is the last URL http://instagram.com"
+ " all these URLs should be printed after the code execution";
public static void main(String args[])
{
String pattern = "([\\w \\W]*)((http://)([\\w \\W]+)(.com))";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(query);
while(m.find())
{
System.out.println(m.group(2));
}
}
}
在运行上面的代码时,只有http://instagram.com被打印到控制台输出
答案 0 :(得分:1)
我找到了另一个RegEx here
https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)
它会查找https,但在您的情况下似乎有效。
我正在使用此代码打印所有3个网址:
public class Main {
static String query = "This is a URL http://facebook.com"
+ " and this is another, http://twitter.com "
+ "this is the last URL http://instagram.com"
+ " all these URLs should be printed after the code execution";
public static void main(String[] args) {
String pattern = "https?:\\/\\/(www\\.)?[-a-zA-Z0-9@:%._\\+~#=]{2,256}\\.[a-z]{2,6}\\b([-a-zA-Z0-9@:%_\\+.~#?&//=]*)";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(query);
while (m.find()) {
System.out.println(m.group());
}
}
}
答案 1 :(得分:0)
我不确定这种模式有多可靠,但是当我运行你的例子时它打印出所有的URL。
(http://[A-Za-z0-9]+\\.[a-zA-Z]{2,3})
如果您遇到如下所示的网址,则必须对其进行修改:
http://www.instagram.com
因为它只捕获没有'www'的网址。
答案 2 :(得分:0)
您的问题是您的正则表达式量词(即*
和+
字符)是贪婪的,这意味着它们尽可能匹配。您需要使用reluctant quantifiers。请参阅下面的更正代码模式 - 只需两个额外字符 - ?
之后的*
字符和+
匹配尽可能少。
String pattern = "([\\w \\W]*?)((http://)([\\w \\W]+?)(.com))";
答案 3 :(得分:0)
也许你正在寻找这个正则表达式:
SELECT
c.id,
(COUNT(cj1.id)+COUNT(co1.id)) AS count_contracts_all,
(COUNT(cj2.id)+COUNT(co2.id)) AS count_contracts_active
FROM
customers c
LEFT OUTER JOIN contracts_jewels cj1 ON c.id = cj1.customer_id
LEFT OUTER JOIN contracts_objects co1 ON c.id = co1.customer_id
LEFT OUTER JOIN contracts_jewels cj2 ON
c.id = cj2.id AND
cj2.final_date >= NOW() AND
cj2.paid = 0 AND
cj2.transferred = 0
LEFT OUTER JOIN contracts_object co2 ON
c.id = co2.id AND
co2.final_date >= NOW() AND
co2.paid = 0 AND
co2.transferred = 0
GROUP BY c.id
例如,从这个字符串:
http://(\w+(?:\.\w+)+)
它提取
http://ww1.amazon.com and http://npr.org
要打破它的运作方式:
"ww1.amazon.com"
"npr.org"
希望这有帮助。
答案 4 :(得分:0)
我希望这会为您清除,但是您匹配的字符太多,您的匹配应该尽可能限制性,因为regex
贪婪并且我会尽可能地匹配。
这是我对你的代码的看法:
public class Main {
static String query = "This is a URL http://facebook.com"
+ " and this is another, http://twitter.com "
+ "this is the last URL http://instagram.com"
+ " all these URLs should be printed after the code execution";
public static void main(String args[]) {
String pattern = "(http:[/][/][Ww.]*[a-zA-Z]+.com)";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(query);
while(m.find())
{
System.out.println(m.group(1));
}
}
}
如果您希望匹配更多需要根据您的需要进行调整,则上述cote将仅匹配您的示例。
实现测试模式的好方法是http://www.regexpal.com/您可以在那里发布您的模式,以便完全符合您的要求,只需记住在Java中将\
替换为双\\
以进行转义caracters。