Question

我正在尝试用Java编写一个检查String的方法，并允许它只包含数字和逗号。此外，没有重复的数字。

例如：

11,22,33 - 这没关系
22,22,33 - 这不行

我使用正则表达式和Set<String>（下面）的组合完成了它的初稿，但是正在寻找更好的东西，最好只使用正则表达式。

public boolean isStringOk(String codes) {
    if(codes.matches("^[0-9,]+$")){ 
        Set<String> nonRepeatingCodes = new LinkedHashSet<String>();
        for(String c: codigoRoletas.split(",")){
            if(nonRepeatingCodes.contains(c)){
                return false;
            }
            else{
                nonRepeatingCodes.add(c);
            }
        }
        return true;
     }
    return false;
}

有人知道这是否可以仅使用正则表达式吗？

Answer 1

我怀疑这是明智的（正如Jarrod Roberson所提到的那样），因为对你的项目中的任何编码人员来说很难理解。但只有正则表达式才有可能：

^(?:(\d+)(?!.*,\1(?!\d)),)*\d+$

双阴性前瞻让人有点难以理解。但这是一个解释：

^                # anchor the regex to the beginning of the string
(?:              # subpattern that matches all numbers, but the last one and all commas
    (\d+)        # capturing group \1, a full number
    (?!          # negative lookahead, that asserts that this number does not occur again
        .*       # consume as much as you want, to look through the whole string
        ,        # match a comma
        \1       # match the number we have already found
        (?!\d)   # make sure that the number has ended (so we don't get false negatives)
    )            # end of lookahead
    ,            # match the comma
)*               # end of subpattern, repeat 0 or more times
\d+              # match the last number
$                # anchor the regex to the beginning of the string

请注意，这只是一般的正则表达式，不是特定于Java。在Java中，您需要转义每个反斜杠，否则它将无法进入正则表达式引擎：

^(?:(\\d+)(?!.*,\\1(?!\\d)),)*\\d+$

Answer 2

请注意，使用正则表达式来处理技术上非常规的语言可能会很危险，尤其是对于大型非匹配字符串。如果你不小心，你可以引入指数时间复杂度。此外，正则表达式引擎必须执行一些后门技巧，这也可能会降低引擎速度。

如果您尝试其他解决方案并且它们会给您带来问题，您可以使用捕获组以及Pattern和Matcher类以这种方式尝试，以使您的代码更清晰：

private static final Pattern PATTERN = Pattern.compile("([\\d]+),?");

public static boolean isValid(String str) {
    Matcher matcher = PATTERN.matcher(str);
    Set<Integer> found = new HashSet<Integer>();
    while (matcher.find()) {
        if (!found.add(Integer.parseInt(matcher.group(1)))
            return false;
    }
    return true;
}

Answer 3

这是我能想出的最难看的正则表达式：

return codes.matches("^(?:,?(\\d+)(?=(?:,(?!\\1\\b)\\d+)*$))+$");

击穿：

,?会消耗下一个逗号（如果有的话）（即，它不是字符串的开头）。
(\d+)捕获第1组
(?=(?:,(?!\1\b)\d+)*$)尝试匹配剩余的数字，检查每个数字以确保它与刚捕获的数字不同。

反向引用后的\b可防止11,111等字符串出现误报。在其他任何地方都不需要它，但如果你愿意，可以在每个\d+上添加一个，这可能会使正则表达式更有效率。但是如果你需要调整正则表达式来获得最大性能，那么使所有量词占有率都会产生更大的影响：

"^(?:,?+(\\d++)(?=(?:,(?!\\1\\b)\\d++)*+$))++$"

Answer 4

这个正则表达式会做

^(?=^\d+(?:,\d+)*$)(?!^.*?((?<=(?:^|,))\d+(?=[,$])).*?\1(?:$|,.*?)).*?$

(?=^\d+(?:,\d+)*$)检查有效格式，如45或556,88,33

如果有任何重复数字，

(?!^.*?((?<=(?:^|,))\d+(?=[,$])).*?\1(?:$|,.*?))不匹配..

.*?匹配上面的负向前瞻返回true所提供的所有内容

工作here

正则表达式 - 捕获重复数字（不是数字）

4 个答案: