Question

我已经做了很多搜索，但我对正则表达式的说法非常糟糕，而且我的google-fu在这个例子中并不强大。

情景：

在推送通知中，我们传递了包含9位内容ID的网址。

示例网址：http://www.something.com/foo/bar/Some-title-Goes-here-123456789.html（123456789是此方案中的内容ID）

解析内容ID的当前正则表达式：

public String getContentIdFromPathAndQueryString(String path, String queryString) {
        String contentId = null;
        if (StringUtils.isNonEmpty(path)) {
            Pattern p = Pattern.compile("([\\d]{9})(?=.html)");
            Matcher m = p.matcher(path);
            if (m.find()) {
                contentId = m.group();
            } else if (StringUtils.isNonEmpty(queryString)) {
                p = Pattern.compile("(?:contentId=)([\\d]{9})(?=.html)");
                m = p.matcher(queryString);
                if (m.find()) {
                    contentId = m.group();
                }
            }
        }

        Log.d(LOG_TAG, "Content id " + (contentId == null ? "not found" : (" found - " + contentId)));
        if (StringUtils.isEmpty(contentId)) {
            Answers.getInstance().logCustom(new CustomEvent("eid_url")
                    .putCustomAttribute("contentId", "empty")
                    .putCustomAttribute("path", path)
                    .putCustomAttribute("query", queryString));
        }

        return contentId;
    }

问题：这可以完成工作，但我需要考虑特定的错误情况。

创建推送的人可能会输入错误长度的内容ID，我们需要抓住它而不管它是什么，所以假设它可以是任意数量的数字......标题也可以包含数字，这很烦人。内容ID始终跟随＆＃34; .html＆＃34;

Answer 1

虽然这里的基本答案只是＆＃34;将{9}限制量词匹配恰好匹配9次出现，而+量词匹配1次出现＆＃34;，有两种模式可以改善。

未转义的点应该在模式中转义以匹配文字点。

如果没有重叠匹配，则不需要在其前面使用带有捕获组的正向前瞻，只需保留捕获组并获取.group(1)值。

non-capturing group (?:...)仍然是消费模式，(?:contentId=)等于contentId=（您可以删除(?:和)）。

不需要在character class内包装单个原子，使用\\d代替[\\d]。 [\\d]实际上是误解的来源，有些人可能认为它是一个分组结构，可能会尝试在方括号中添加替代序列，而[...]匹配单个字符

所以，你的代码看起来像

        Pattern p = Pattern.compile("(\\d+)\\.html");     // No lookahead, + instead of {9}
        Matcher m = p.matcher(path);
        if (m.find()) {
            contentId = m.group(1);                       // (1) refers to Group 1
        } else if (StringUtils.isNonEmpty(queryString)) {
            p = Pattern.compile("contentId=(\\d+)\\.html");
            m = p.matcher(queryString);
            if (m.find()) {
                contentId = m.group(1);
            }
        }

正则表达式与错误检查

1 个答案: