Question

我想将以下正则表达式应用于字符串。它与Grant Skinners Regexr运行良好，它在http://www.regexplanet.com/advanced/java/index.html（区分大小写的集合）上运行正常，但Java不会吞下它。它永远不会打到while循环。这是我的代码：

public static void main(String args[]) {
   final String testString =
      "lorem upsadsad asda 12esadas test@test.com asdlawaljkads test[at]test" +
      "[dot]com test jasdsa meter";
   final Pattern ptr =
      Pattern.compile(
         "^[A-Z0-9\\._%+-]+(@|\\s*\\[\\s*at\\s*\\]\\s*)[A-Z0-9\\.-]+" +
         "(\\.|\\s*\\[\\s*dot\\s*\\]\\s*)[a-z]{2,6}$",
         Pattern.CASE_INSENSITIVE);

    try {
        final Matcher mat = ptr.matcher(testString);
        while (mat.find()) {
            final String group1 = mat.group(1);
            System.out.println(group1);
            final String group2 = mat.group(2);
            System.out.println(group2);
            final String group3 = mat.group(3);
            System.out.println(group3);
        }
    } catch (final Exception e) {
        e.printStackTrace();
    }
}

Answer 1

不需要复杂的正则表达式。正如其他用户建议的那样，将"[dot]"替换为"."，将"[at]"替换为"@"，即：

myAddressLine = myAddressLine.replace("[dot]", ".").replace("[at]","@");

现在，我们可以将您的正则表达式简化为：

Pattern.compile(
"\\b([a-z0-9._%+-]+)@([a-z0-9.-]+)\\.([a-z]{2,6})\\b", Pattern.CASE_INSENSITIVE);

\\b是word boundary，这是您想要的，而不是"^"和"$"，表示以和开头分别以结尾

请注意，我的capturing groups与您的character classes不同。之前，您正在捕获"@"和"[dot]"等。现在正在捕获“用户名”，“域名”和“顶级域名”，这就是我认为你想要的。

注意：您无需转义here中的特殊字符，即[.]表示句点，[\\.]是不必要的。它仍然可以正常工作，因为您需要\\\\来实际匹配\，{{3}}解释。

Answer 2

final Pattern ptr = Pattern.compile(
    "\\b([A-Z0-9\\._%+-]+)"+
    "(?:@|\\s*\\[\\s*at\\s*\\]\\s*)"+
    "([A-Z0-9\\.-]+)"+
    "(?:\\.|\\s*\\[\\s*dot\\s*\\]\\s*)"+
    "([a-z]{2,6})\\b", Pattern.CASE_INSENSITIVE);

Answer 3

为了简化你的正则表达式，我会先用实际字符替换[at]和[dot]。然后使用标准的电子邮件正则表达式，例如：

matches("(?i)\\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}\\b");

Regexp不起作用

3 个答案: