使用RegEx匹配URL

时间:2015-10-29 19:17:49

标签: regex

我正在编写一个正则表达式,用于从我的监控系统中自动生成的电子邮件中提取URL。例如:

https://mon.contoso.com/mon/call.py?fn=edit&num=1389896156

我需要一个正则表达式匹配:

https://mon.contoso.com/mon/call.py?fn=edit&num=XXXXXXXXX

其中“x”总是在变化。我遇到了“?”的问题。这一点是将URL附加到JIRA中的字段。

2 个答案:

答案 0 :(得分:2)

Pattern p = new Pattern("https://mon.contoso.com/mon/call.py?fn=edit&num=(\d+)")
Matcher m = p.matcher(inputEmail);
return m.matches() ? m.group(1) : "";

如果是数字,则返回num,否则您可能希望使用\w而不是\d。如果您需要整个URL,请删除group()参数。

答案 1 :(得分:1)

您没有说明您正在使用的语言。

在Python和JavaScript中,此正则表达式将识别各种URL:

/\[[^\]\n]+\](?:\([^\)\n]+\)|\[[^\]\n]+\])|(?:\/\w+\/|.:\\|\w*:\/\/|\.+\/[./\w\d]+|(?:\w+\.\w+){2,})[./\w\d:/?#\[\]@!$&'()*+,;=\-~%]*/gi

您可以参考此regex101 test了解正在使用的正则表达式的示例。

enter image description here

说明:

/\[[^\]\n]+\](?:\([^\)\n]+\)|\[[^\]\n]+\])|(?:\/\w+\/|.:\\|\w*:\/\/|\.+\/[./\w\d]+|(?:\w+\.\w+){2,})[./\w\d:/?#\[\]@!$&'()*+,;=\-~%]*/gi
    1st Alternative: \[[^\]\n]+\](?:\([^\)\n]+\)|\[[^\]\n]+\])
        \[ matches the character [ literally
        [^\]\n]+ match a single character not present in the list below
            Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
            \] matches the character ] literally
            \n matches a line-feed (newline) character (ASCII 10)
        \] matches the character ] literally
        (?:\([^\)\n]+\)|\[[^\]\n]+\]) Non-capturing group
            1st Alternative: \([^\)\n]+\)
                \( matches the character ( literally
                [^\)\n]+ match a single character not present in the list below
                    Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
                    \) matches the character ) literally
                    \n matches a line-feed (newline) character (ASCII 10)
                \) matches the character ) literally
            2nd Alternative: \[[^\]\n]+\]
                \[ matches the character [ literally
                [^\]\n]+ match a single character not present in the list below
                    Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
                    \] matches the character ] literally
                    \n matches a line-feed (newline) character (ASCII 10)
                \] matches the character ] literally
    2nd Alternative: (?:\/\w+\/|.:\\|\w*:\/\/|\.+\/[./\w\d]+|(?:\w+\.\w+){2,})[./\w\d:/?#\[\]@!$&'()*+,;=\-~%]*
        (?:\/\w+\/|.:\\|\w*:\/\/|\.+\/[./\w\d]+|(?:\w+\.\w+){2,}) Non-capturing group
            1st Alternative: \/\w+\/
                \/ matches the character / literally
                \w+ match any word character [a-zA-Z0-9_]
                    Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
                \/ matches the character / literally
            2nd Alternative: .:\\
                . matches any character (except newline)
                : matches the character : literally
                \\ matches the character \ literally
            3rd Alternative: \w*:\/\/
                \w* match any word character [a-zA-Z0-9_]
                    Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
                : matches the character : literally
                \/ matches the character / literally
                \/ matches the character / literally
            4th Alternative: \.+\/[./\w\d]+
                \.+ matches the character . literally
                    Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
                \/ matches the character / literally
                [./\w\d]+ match a single character present in the list below
                    Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
                    ./ a single character in the list ./ literally
                    \w match any word character [a-zA-Z0-9_]
                    \d match a digit [0-9]
            5th Alternative: (?:\w+\.\w+){2,}
                (?:\w+\.\w+){2,} Non-capturing group
                    Quantifier: {2,} Between 2 and unlimited times, as many times as possible, giving back as needed [greedy]
                    \w+ match any word character [a-zA-Z0-9_]
                        Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
                    \. matches the character . literally
                    \w+ match any word character [a-zA-Z0-9_]
                        Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
        [./\w\d:/?#\[\]@!$&'()*+,;=\-~%]* match a single character present in the list below
            Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
            ./ a single character in the list ./ literally
            \w match any word character [a-zA-Z0-9_]
            \d match a digit [0-9]
            :/?# a single character in the list :/?# literally
            \[ matches the character [ literally
            \] matches the character ] literally
            @!$&'()*+,;= a single character in the list @!$&'()*+,;= literally (case insensitive)
            \- matches the character - literally
            ~% a single character in the list ~% literally
    g modifier: global. All matches (don't return on first match)
    i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])