哪种是匹配短串的最佳方法?正则表达式或if else语句?

时间:2013-06-15 12:26:31

标签: regex if-statement string-matching

在testOne()中我使用正则表达式使用判断字符串是否包含某些特定的字符串

在testTwo()中,我使用if else语句来做同样的事情

我想知道为什么testTwo()总是比我的测试用例中的testOne()快

正则表达式表达式不适合这个问题吗?或者我的正则表达式写得不好?

我的测试代码如下,非常感谢!

public class TestReg {

    static final Pattern PATT = Pattern
            .compile("(tudou|video.sina|v.youku|v.ku6|tv.sohu|v.163|tv.letv|v.ifeng|v.qq|iqiyi|(5)?6)\\.(com|cn)");

    @Test
    public void testOne() {
        int count = 0;
        for (int i = 0; i < 10000; i++) {
            for (String vurl : TESTCASES) {
                if (PATT.matcher(vurl).find())
                    count++;
            }
        }
        System.out.println("testOne:" + count);
    }

    @Test
    public void testTwo() {
        int count = 0;
        for (int i = 0; i < 10000; i++) {
            for (String vurl : TESTCASES) {
                if (vurl.indexOf("tudou.com") != -1
                        || vurl.indexOf("video.sina.com") != -1
                        || vurl.indexOf("v.youku.com") != -1
                        || vurl.indexOf("v.ku6.com") != -1
                        || vurl.indexOf("56.com") != -1
                        || vurl.indexOf("tv.sohu.com") != -1
                        || vurl.indexOf("v.163.com") != -1
                        || vurl.indexOf("tv.letv.com") != -1
                        || vurl.indexOf("v.ifeng.com") != -1
                        || vurl.indexOf("v.qq.com") != -1
                        || vurl.indexOf("iqiyi.com") != -1
                        || vurl.indexOf("6.cn") != -1) {
                    count++;
                }
            }
        }
        System.out.println("testOne:" + count);
    }

    static final String[] TESTCASES = {
            "http://blog.csdn.net/v_july_v/article/details/7624837",
            "http://jobs.douban.com/intern/apply/?type=dev&position=intern_sf",
            "https://class.coursera.org/ml/lecture/index",
            "http://blog.csdn.net/v_july_v/article/details/7624837",
            "http://jobs.douban.com/intern/apply/?type=dev&position=intern_sf",
            "https://class.coursera.org/ml/lecture/index",
            "http://blog.csdn.net/v_july_v/article/details/7624837",
            "http://jobs.douban.com/intern/apply/?type=dev&position=intern_sf",
            "https://class.coursera.org/ml/lecture/index",
            "http://blog.csdn.net/v_july_v/article/details/7624837",
            "http://jobs.douban.com/intern/apply/?type=dev&position=intern_sf",
            "https://class.coursera.org/ml/lecture/index",
            "http://www.56.com/u38/v_NjYyNTUyMjc.html",
            "http://video.sina.com.cn/v/b/69614895-2128825751.html",
            "http://www.tudou.com/programs/view/xcPewAoJ26M",
            "http://v.youku.com/v_show/id_XMzQ0OTI0MTgw.html",
            "http://www.56.com/u87/v_NjMzMjEzNTY.html",
            "http://tv.sohu/u87/v_NjMzMjEzNTY.html",
            "http://tv.letv/u38/v_NjYyNTUyMjc.html",
            "http://v.ifeng/v/b/69614895-2128825751.html",
            "http://v.qq/programs/view/xcPewAoJ26M",
            "http://v.163/v_show/id_XMzQ0OTI0MTgw.html",
            "http://iqiyi/u87/v_NjMzMjEzNTY.html",
            "http://v.6.cn/u87/v_NjMzMjEzNTY.html" };

}

1 个答案:

答案 0 :(得分:3)

我不会使用:

  • 正则表达式旨在匹配模式;他们为完全匹配而过度杀伤
  • ||声明有点痛苦。

我只使用HashSet<String>。对于每个网址,您首先使用类似URL类的内容来提取主机名,然后查看它是否在您感兴趣的主机集中。

除此之外,这将防止误报 - 您当前的方法将匹配

http://www.someotherhost.com/something/tudou.com

......你实际上并不想这样做。