提供一组5个左右的URL时生成URL模式

时间:2010-03-02 10:06:24

标签: java algorithm string url design-patterns

提供一组URL,我需要生成一个模式,

例如:

http://www.buy.com/prod/disney-s-star-struck/q/loc/109/213724402.html
http://www.buy.com/prod/samsung-f2380-23-widescreen-1080p-lcd-monitor-150-000-1-dc-8ms-1920-x/q/loc/101/211249863.html
http://www.buy.com/prod/panasonic-nnh765wf-microwave-oven-countertop-1-6-ft-1250w-panasonic/q/loc/66357/202045865.html
http://www.buy.com/prod/escape-by-calvin-klein-for-women-3-4-oz-edp-spray/q/loc/66740/211210860.html
http://www.buy.com/prod/v-touch-8gb-mp3-mp4-2-8-touch-screen-2mp-camera-expandable-minisd-w/q/loc/111/211402014.html

模式是

  

http://www.buy.com/prod/[ ^〜] / Q / LOC / [^〜] html的

2 个答案:

答案 0 :(得分:3)

一种天真的方法是将您的网址分组(比如url.split("/"))并比较生成的数组。部件匹配,只需将它们作为常量字符串添加到模式中。如果没有,请添加与所有可能值匹配的模式。这是一个简单的实现:

public static void main(String[] args) throws Exception {
    String[] urls = {
            "http://www.buy.com/prod/disney-s-star-struck/q/loc/109/213724402.html", 
            "http://www.buy.com/prod/samsung-f2380-23-widescreen-1080p-lcd-monitor-150-000-1-dc-8ms-1920-x/q/loc/101/211249863.html",
            "http://www.buy.com/prod/panasonic-nnh765wf-microwave-oven-countertop-1-6-ft-1250w-panasonic/q/loc/66357/202045865.html",
            "http://www.buy.com/prod/escape-by-calvin-klein-for-women-3-4-oz-edp-spray/q/loc/66740/211210860.html",
            "http://www.buy.com/prod/v-touch-8gb-mp3-mp4-2-8-touch-screen-2mp-camera-expandable-minisd-w/q/loc/111/211402014.html"
    };

    String all = "[^/]+";
    String[] pattern = urls[0].split("/");
    for (int i = 0; i < urls.length; i++) {
        String parts[] = urls[i].split("/");

        // TODO handle urls with different number of parts
        for (int j = 0; j < pattern.length; j++) {
            // intentionally match by reference
            if (pattern[j] != all && !pattern[j].equals(parts[j])) {
                pattern[j] = all;
            }
        }
    }

    // build pattern - use [^/]+ as a replacement (anything but a '/')
    StringBuilder buf = new StringBuilder();
    for (int i = 0; i < pattern.length; i++) {
        buf.append(pattern[i] == all ? all : Pattern.quote(pattern[i]));
        buf.append("/");
    }
    // stip last "/"
    buf.setLength(buf.length() - 1);

    // compile pattern
    Pattern p = Pattern.compile(buf.toString());

    // output
    System.out.println(p.pattern());
    for (int i = 0; i < urls.length; i++) {
        System.out.println(p.matcher(urls[i]).matches());
    }

}

以下是此示例的输出:

\Qhttp:\E/\Q\E/\Qwww.buy.com\E/\Qprod\E/[^/]+/\Qq\E/\Qloc\E/[^/]+/[^/]+
true
true
true
true
true

如您所见,该模式看起来有点奇怪。这是由于模式引用。然而,该模式匹配此示例中的所有URL。虽然还有一些工作要做,最常见的是在拆分后使用不同数量的部件处理网址和常见后缀(.html)。

答案 1 :(得分:3)

您可以尝试使用此工具 txt2re 一个不错的在线工具,您可以在其中输入示例字符串,并生成一个与您匹配的正则表达式。

txt2re将自己描述为:

  程序员头痛缓解::   正则表达式生成器