pdf文件URL的正则表达式

时间:2014-12-27 23:08:08

标签: regex url pdf

我正在为这种格式的网址寻找正则表达式:

1 个答案:

答案 0 :(得分:4)

您可以使用此常规模式:

^(https?:\/\/)?www\.([\da-z\.-]+)\.([a-z\.]{2,6})\/[\w \.-]+?\.pdf$

DEMO

或者,如果您想验证该确认网址,则可以使用:

^(https?:\/\/)?www\.website\.com\/sample\.pdf$

DEMO

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  ^                      the beginning of the string
--------------------------------------------------------------------------------
  (                      group and capture to \1 (optional
                         (matching the most amount possible)):
--------------------------------------------------------------------------------
    http                 'http'
--------------------------------------------------------------------------------
    s?                   's' (optional (matching the most amount
                         possible))
--------------------------------------------------------------------------------
    :                    ':'
--------------------------------------------------------------------------------
    \/                   '/'
--------------------------------------------------------------------------------
    \/                   '/'
--------------------------------------------------------------------------------
  )?                     end of \1 (NOTE: because you are using a
                         quantifier on this capture, only the LAST
                         repetition of the captured pattern will be
                         stored in \1)
--------------------------------------------------------------------------------
  www                    'www'
--------------------------------------------------------------------------------
  \.                     '.'
--------------------------------------------------------------------------------
  website                'website'
--------------------------------------------------------------------------------
  \.                     '.'
--------------------------------------------------------------------------------
  com                    'com'
--------------------------------------------------------------------------------
  \/                     '/'
--------------------------------------------------------------------------------
  sample                 'sample'
--------------------------------------------------------------------------------
  \.                     '.'
--------------------------------------------------------------------------------
  pdf                    'pdf'
--------------------------------------------------------------------------------
  $                      before an optional \n, and the end of the
                         string

Java代码:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

String input = "https://www.website.com/sample.pdf";

Pattern pattern = Pattern.compile("^(https?:\\/\\/)?www\\.website\\.com\\/sample\\.pdf$");
Matcher matcher = pattern.matcher(input);

if(matcher.find()) {
    System.out.println("Match found");
} else {
    System.out.println("NO Match found");
}