Question

和往常一样，谢谢你。

我正在尝试熟悉regEx，我遇到了与网址匹配的问题。

以下是一个示例网址：

www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html

这是我的正则表达式分解：

[site]/[dir]*?/[year]/[month]/[day]/[storyTitle]?/[id]/htmlpage.html

[id]是一个长度为22个字符的字符串，可以是大写或小写字母，也可以是数字。但是，我不想从URL中提取它。只是澄清

现在，我需要从此网址中提取两个值。

首先，我需要提取dirs（s）。但是，[dir]是可选的，但也可以是想要的。换句话说，参数不能存在，或者可能是dir1/dir2/dir3 ..等。所以，关闭我的第一个例子：

    www.examplesite.com/dir1/dir2/dir3/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html

在这里，我需要提取dir1/dir2/dir3，其中dir是一个单词的字符串，全部是小写字母（即sports / mlb / games）。 dir中没有数字，仅以此为例。

但是在这个有效URL的例子中：

www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html

没有[dir]因此我不会提取任何内容。因此，[dir]是可选的

其次，我需要提取[storyTitle]，[storyTitle]也是可选的，就像上面的[dir]一样，但是如果有storyTitle则只能有一个。{/ p >

所以关掉我之前的例子

www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html

在我需要提取'title-of-some-story'的地方有效，其中故事标题是以小写字母划分的短划线字符串。以下示例也有效：

www.examplesite.com/dir/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html

在上面的示例中，没有[storyTitle]因此使其成为可选

最后，为了彻底，没有[dir]且没有[storyTitle]的网址也是有效的。例如：

www.examplesite.com/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html

是有效的网址。任何输入都会有所帮助我希望我很清楚。

Answer 1

这是一个可行的例子。

public static void main(String[] args) {

    Pattern p = Pattern.compile("(?:http://)?.+?(/.+?)?/\\d+/\\d{2}/\\d{2}(/.+?)?/\\w{22}");

    String[] strings ={
            "www.examplesite.com/dir1/dir2/4444/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html",
            "www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html",
            "www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html",
            "www.examplesite.com/dir/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html",
            "www.examplesite.com/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html"
    };
    for (int idx = 0; idx < strings.length; idx++) {
        Matcher m = p.matcher(strings[idx]);
        if (m.find()) {
            String dir = m.group(1);
            String title = m.group(2);
            if (title != null) {
                title = title.substring(1); // remove the leading /
            }
            System.out.println(idx+": Dir: "+dir+", Title: "+title);
        }
    }
}

Answer 2

这是一个全正则表达式解决方案。

修改允许http：//

Java来源：

import java.util.*;
import java.lang.*;
import java.util.regex.*;

class Main
{
    public static void main (String[] args) throws java.lang.Exception
    {
        String url = "http://www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html";
        String url2 = "www.examplesite.com/dir/dir2/dir3/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html";
        String url3 = "www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html";

        String patternStr = "(?:http://)?[^/]*[/]?([\\S]*)/[\\d]{4}/[\\d]{2}/[\\d]{2}[/]?([\\S]*)/[\\S]*/[\\S]*";

        // Compile regular expression
        Pattern pattern = Pattern.compile(patternStr);


        // Match 1st url
        System.out.println("Match 1st URL:");
        Matcher matcher = pattern.matcher(url);

        if (matcher.find()) {
            System.out.println("URL: " + matcher.group(0));
            System.out.println("DIR: " + matcher.group(1));
            System.out.println("TITLE: " + matcher.group(2));
        }
        else{ System.out.println("No match."); }


        // Match 2nd url
        System.out.println("\nMatch 2nd URL:");
        matcher = pattern.matcher(url2);

        if (matcher.find()) {
            System.out.println("URL: " + matcher.group(0));
            System.out.println("DIR: " + matcher.group(1));
            System.out.println("TITLE: " + matcher.group(2));
        }
        else{ System.out.println("No match."); }


        // Match 3rd url
        System.out.println("\nMatch 3rd URL:");
        matcher = pattern.matcher(url3);

        if (matcher.find()) {
            System.out.println("URL: " + matcher.group(0));
            System.out.println("DIR: " + matcher.group(1));
            System.out.println("TITLE: " + matcher.group(2));
        }
        else{ System.out.println("No match."); }
    }
}

<强>输出：

Match 1st URL:
URL: http://www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
DIR: dir
TITLE: title-of-some-story

Match 2nd URL:
URL: www.examplesite.com/dir/dir2/dir3/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
DIR: dir/dir2/dir3
TITLE: 

Match 3rd URL:
URL: www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
DIR: 
TITLE: title-of-some-story

Java regEx URL匹配问题

2 个答案: