Java URI regexp耗时太长

时间:2011-05-20 17:21:37

标签: java regex filter

我的Java应用程序中有一个servlet过滤器,以确保用户使用文章和类别的最新URI。问题是,根据分析器结果,这个过滤器占用了(自己)请求总时间的大约40%(即使对于简单的URI“/”)(内部动作非常重要,它的动态网页带有巨大的菜单,文章排名等。)。

public class NameFilter implements Filter {

    private ArticleServiceIface articleService;
    private CategoryServiceIface categoryService;
    private UrlRewriteServiceIface urlRewriteService;
    private Pattern pattern = Pattern.compile("^(?>.*?)/(article|category)/(\\d+)/(?>.*)$");

    public void init(FilterConfig filterConfig) throws ServletException {
        ApplicationContext ctx = WebApplicationContextUtils.getRequiredWebApplicationContext(filterConfig.getServletContext());
        articleService = (ArticleServiceIface) ctx.getBean("articleService");
        categoryService = (CategoryServiceIface) ctx.getBean("categoryService");
        urlRewriteService = (UrlRewriteServiceIface) ctx.getBean("urlRewriteService");
    }

    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
        String uri = ((HttpServletRequest) request).getRequestURI();
        Matcher matcher = pattern.matcher(uri);
        String currUri;
        if (matcher.matches()) {
            if (matcher.group(1).equals("article")) {
                Long articleId = Long.valueOf(matcher.group(2));

                ArticleDTO a = articleService.getById(articleId);
                currUri = urlRewriteService.getUrl(a.getId());
            } else {
                Long categoryId = Long.valueOf(matcher.group(2));

                CategoryDTO c = categoryService.getById(categoryId);
                currUri = urlRewriteService.getCategoryUrl(c.getId());
            }
        } else { //does not match neighter article nor category
            chain.doFilter(request, response);
            return;
        }
        if (currUri.equals(uri)) {
            chain.doFilter(request, response);
        } else {
            HttpServletResponse res = (HttpServletResponse) response;
            res.setStatus(HttpServletResponse.SC_MOVED_PERMANENTLY);
            res.setHeader("Location", currUri);
            res.getWriter().close();
        }


    }

    public void destroy() {
    }
}

我花了几个小时调试和分析它,尝试了许多不同的方法来制定正则表达式,但结果总是一样的。

这个瓶子似乎是在匹配方法中,它被反复调用,并且在某些时候由于某种原因它迭代地调用模式匹配(几千次)......

感谢您的任何建议。

编辑:Profiler results(对我来说似乎很奇怪......根据调试器,这应该解析URI ==“/”)


EDIT2:当前正则表达式:

 private static Pattern pattern = Pattern.compile(".*?/(article|category)/(\\d+)/.*");

结果仍然相同。我会尝试用

来衡量它
  System.out.print(System.currTimeMillis - time)

EDIT3:结论:它可能是netbeans探测器错误......

我试过这段代码和URI“/”

    long time = System.currentTimeMillis();
    if (matcher.matches()) {
        if (matcher.group(1).equals("article")) {
            Long articleId = Long.valueOf(matcher.group(2));

            ArticleDTO a = articleService.getById(articleId);
            currUri = urlRewriteService.getUrl(a.getId());
        } else {
            Long categoryId = Long.valueOf(matcher.group(2));

            CategoryDTO c = categoryService.getById(categoryId);
            currUri = urlRewriteService.getCategoryUrl(c.getId());
        }
    } else { //does not match neighter article nor category
        System.out.println(System.currentTimeMillis() - time);
        ....

输出总是0.所以在我看来netbeans profiler由于某种原因正在为这种方法增加时间。

但是谢谢大家的帮助和合作,我学到了很少的正则表达式技巧。

1 个答案:

答案 0 :(得分:1)

实际上不需要在模式中使用Lookbehinds。以下代码适用于我并且在相当快的时间内:

long l = System.currentTimeMillis();
Pattern p = Pattern.compile("^.*?/(article|category)/(\\d+)/.*$");
Matcher m = p.matcher("/category/1012/Grafy");
System.out.println("Matches: " + m.matches());
System.out.println("Group1: " + m.group(1) + ", Group2: " + m.group(2));
System.out.println("Time taken: " + (System.currentTimeMillis()-l));

输出

Matches: true
Group1: category, Group2: 1012
Time taken: 0

编辑尝试find()intead of matches(),如下所示:

long l = System.currentTimeMillis();
p = Pattern.compile("/(article|category)/(\\d+)/");
m = p.matcher("/en/article/123/articleName");
System.out.println("Matches: " + m.find());
System.out.println("Group1: " + m.group(1) + ", Group2: " + m.group(2));
System.out.println("Time taken: " + (System.currentTimeMillis()-l));