Question

我正在尝试使用正则表达式来匹配特定的url格式。特别是stackexchange的api url。例如，我希望这两个匹配：

http://api.stackoverflow.com/1.1/questions/1234/answers  
http://api.physics.stackexchange.com/1.0/questions/5678/answers

其中

不是粗体的一切都必须相同。
第一个粗体部分，只能由a到z，以及一个或没有句号。
- 这也很好，如果有一个完整的句号，必须遵循“stackexchange”这个词。然而，这并不重要。
第二个粗体部分只能是1或0。
最后一个粗体部分只能是数字0到9，可以是任意长度
网址之前或之后根本没有任何内容，甚至不是尾随斜杠

Answer 1

Pattern.compile("^(?i:http://api\\.(?:[a-z]+(?:\\.stackexchange)?)\\.com)/1\\.[01]/questions/[0-9]+/answers\\z")

^确保它在输入开始时开始，\\z确保它在输入结束时结束。所有的点都被转义，因此它们是字面的。 (?i:...)部分根据URL规范使域和方案不区分大小写。 [01]仅匹配字符0或1. [0-9]+匹配1个或多个阿拉伯数字。其余的是自我解释。

Answer 2

^http://api[.][a-z]+([.]stackexchange)?[.]com/1[.][01]/questions/[0-9]+/answers$

^匹配字符串的开头，$匹配行尾，[.]是一种替代方法来逃避点而不是反斜杠（它本身需要是转义为\\.）。

Answer 3

这个经过测试的Java程序有一个注释的正则表达式应该可以解决这个问题：

import java.util.regex.*;
public class TEST {
    public static void main(String[] args) {
        String s = "http://api.stackoverflow.com/1.1/questions/1234/answers";

        Pattern p = Pattern.compile(
            "http://api\\.              # Scheme and api subdomain.\n" +
            "(?:                        # Group for domain alternatives.\n" +
            "  stackoverflow            # Either one\n" +
            "| physics\\.stackexchange  # or the other\n" +
            ")                          # End group for domain alternatives.\n" +
            "\\.com                     # TLD\n" +
            "/1\\.[01]                  # Either 1.0 or 1.1\n" +
            "/questions/\\d+/answers    # Rest of path.", 
            Pattern.COMMENTS);
        Matcher m = p.matcher(s);
        if (m.matches()) {
            System.out.print("Match found.\n");
        } else {
            System.out.print("No match found.\n");
        }
    }
}

特定url格式的正则表达式

3 个答案: