Question

我在一个项目中，我需要检查字符串是否采用正确的格式ABC1234，即3个字母，然后是4个数字。有人告诉我不要使用正则表达式来解决这个问题。

我想出了以下代码，但是它很笨重，因此我正在寻找更清洁，更高效的东西。

String sample = ABC1234

char[] chars = sample.toCharArray();

if(Character.isLetter(chars[0]) && Character.isLetter(chars[1]) && 
   Character.isLetter(chars[2]) && Character.isDigit(chars[3]) && 
   Character.isDigit(chars[4]) && Character.isDigit(chars[5]) && 
   Character.isDigit(chars[6])){

    list.add(sample);
}

// OUTPUT: ABC1234 gets added to "list". When it prints, it appears as ABC1234.

所有输出均符合预期，但我知道这可以更有效地完成，或者总体上可以做得更好。

我只是检查前3个字符以确认它们都是字母，而后4个字符应该是数字。

有什么建议吗？预先感谢。

Answer 1

您不需要

char[] chars = sample.toCharArray();

相反，您只能这样做

if(Character.isLetter(sample.charAt(0))

您也可以更加喜欢并做类似的事情：

void myFonc(string sample) {
 for (int i =0; i < 3; ++i)
        if (!Character.isLetter(sample.charAt(i)))
            return;

 for (int i =3; i < 7; ++i)
        if (!Character.isDigit(sample.charAt(i)))
            return;
list.add(sample);

}

Answer 2

这是另一种方式。

String sample = "ABC1234";
if (sample.substring(0, 3).chars().allMatch(Character::isLetter)
      && sample.substring(3).chars().allMatch(Character::isDigit)) {
  list.add(sample);
}

Answer 3

由于问题包含“ 所有输出均符合预期，但我知道这样做可以更有效率，或者总体上可以做得更好。”（并且由于我喜欢性能），我编写了一些基准来比较每个基准回答以得出有关效率的结论（查看吞吐量）。

可以在问题的底部找到整个基准代码，如果您发现任何错误，我很乐意予以纠正（即使它不是完美的，也可以很好地表明每个答案的性能）。

测试是在安装了OpenJDK8的DigitalOcean Droplet，2GB内存，2个vCore（Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz）上进行的，JMH版本是1.21。

每个答案都用3个字符串测试，"ABC1234"以反映问题中的示例，"ABC123D"应该失败，而"ABC123"太短（不确定是否与OP）。测试配置为5分叉，5 1秒的预热迭代，20秒的1测量迭代。

结果

Benchmark                            (sample)   Mode  Cnt          Score         Error  Units
MyBenchmark.aomine                    ABC1234  thrpt  100    5102477.405 ±   92474.543  ops/s
MyBenchmark.aomine                    ABC123D  thrpt  100    5325954.315 ±  118367.303  ops/s
MyBenchmark.aomine                      AB123  thrpt  100  228544750.370 ± 2972826.551  ops/s
MyBenchmark.azro                      ABC1234  thrpt  100   38550638.399 ±  582816.997  ops/s
MyBenchmark.azro                      ABC123D  thrpt  100   38159991.786 ±  791457.371  ops/s
MyBenchmark.azro                        AB123  thrpt  100   76372552.584 ± 1131365.381  ops/s
MyBenchmark.baselineCarlosDeLaTorre   ABC1234  thrpt  100   37584463.448 ±  444739.798  ops/s
MyBenchmark.baselineCarlosDeLaTorre   ABC123D  thrpt  100   38461464.626 ±  461497.068  ops/s
MyBenchmark.baselineCarlosDeLaTorre     AB123  thrpt  100   52743609.713 ±  590609.005  ops/s
MyBenchmark.elliotFrisch              ABC1234  thrpt  100   16531274.955 ±  313705.782  ops/s
MyBenchmark.elliotFrisch              ABC123D  thrpt  100   16861377.659 ±  361382.816  ops/s
MyBenchmark.elliotFrisch                AB123  thrpt  100  227980231.801 ± 3071776.693  ops/s
MyBenchmark.elliotFrischOptimized     ABC1234  thrpt  100   37031168.714 ±  749067.222  ops/s
MyBenchmark.elliotFrischOptimized     ABC123D  thrpt  100   33383546.778 ±  799217.656  ops/s
MyBenchmark.elliotFrischOptimized       AB123  thrpt  100  214954411.915 ± 5283511.503  ops/s
MyBenchmark.elliotFrischRegex         ABC1234  thrpt  100    6862779.467 ±  122048.790  ops/s
MyBenchmark.elliotFrischRegex         ABC123D  thrpt  100    6830229.583 ±  119561.120  ops/s
MyBenchmark.elliotFrischRegex           AB123  thrpt  100   10797021.026 ±  558964.833  ops/s
MyBenchmark.mark                      ABC1234  thrpt  100   38451993.441 ±  478379.375  ops/s
MyBenchmark.mark                      ABC123D  thrpt  100   37667656.659 ±  680548.809  ops/s
MyBenchmark.mark                        AB123  thrpt  100  228656962.146 ± 2858730.169  ops/s
MyBenchmark.mrB                       ABC1234  thrpt  100   15490382.831 ±  233777.324  ops/s
MyBenchmark.mrB                       ABC123D  thrpt  100     575122.575 ±   10201.967  ops/s
MyBenchmark.mrB                         AB123  thrpt  100  231175971.072 ± 2074819.634  ops/s
MyBenchmark.pradipforever             ABC1234  thrpt  100    5105663.672 ±  171843.786  ops/s
MyBenchmark.pradipforever             ABC123D  thrpt  100    5305419.983 ±   80514.769  ops/s
MyBenchmark.pradipforever               AB123  thrpt  100   12211850.301 ±  217850.395  ops/s

图表

有2种不同的图表，因为ABC123图表中的吞吐量非常大（因为某些方法在比较String长度后返回false），如果将其添加到其余部分（吞吐量较小）中，将使其变得不可读

图表中的数字表示每秒的吞吐量（执行）。

一些注释和改进

mrB

因为这不是一个完整的答案（仅用于检查int部分），所以我使用了@elliotFrisch的字符验证方法。当然，当字符串为ABC1234时，它很快，但是尝试ABC123D并捕获NumberFormatException时，您会发现性能很差。

elliotFrisch

看了看性能为何不如其他产品那么快，尽管可读性很强，但我得出的结论是因为调用s.toCharArray()一次用于验证字符，一次调用用于验证数字。

我对此进行了改进，以使其仅被调用一次，这可以在elliotFrischOptimized下的结果中看到。

azro

好的解决方案，但是ABC123的性能比其他方法低，这是因为调用了char[] c = s.toCharArray()然后验证了c.length而不是直接验证s.length()。结果显示为mark，可以看出实施此检查的一种改进。

Tl; dr和结论

原始代码已经非常快，执行长度检查使速度更快，如azro的答案所示。要使此长度检查更快（防止调用s.toCharArray()，请使用mark代码。

如果您想使用一种更具可读性/多功能性的解决方案，并且可以重复使用，我会选择elliotFrischOptimized方法，该方法（几乎）是一样快的。

如果您不太在乎性能（它仍然会检查将近700万个字符串/秒，如结果所示），则使用@elliotFrisch提供的正则表达式可以工作，它可读性强且可维护。

代码

@Fork(5)
@Warmup(iterations = 5, time = 1)
@Measurement(iterations = 20, time = 1)
@State(Scope.Thread)
public class MyBenchmark {

    @Param({ "ABC1234", "ABC123D", "AB123" })
    String sample;

    Pattern p;

    int goodLength;

    @Setup
    public void setup() {
        this.p = Pattern.compile("\\D{3}\\d{4}");
        this.goodLength = 7;
    }

    @Benchmark
    public boolean baselineCarlosDeLaTorre() {
        char[] chars = this.sample.toCharArray();

        if (Character.isLetter(chars[0]) && Character.isLetter(chars[1]) &&
                Character.isLetter(chars[2]) && Character.isDigit(chars[3]) &&
                Character.isDigit(chars[4]) && Character.isDigit(chars[5]) &&
                Character.isDigit(chars[6])) {
            return true;
        }
        return false;
    }

    @Benchmark
    public boolean mark() {
        if (this.sample.length() != this.goodLength) {
            return false;
        }

        char[] chars = this.sample.toCharArray();

        return Character.isLetter(chars[0]) && Character.isLetter(chars[1]) &&
                Character.isLetter(chars[2]) && Character.isDigit(chars[3]) &&
                Character.isDigit(chars[4]) && Character.isDigit(chars[5]) &&
                Character.isDigit(chars[6]);
    }

    @Benchmark
    public boolean azro() {
        char[] chars = this.sample.toCharArray();

        if (chars.length == this.goodLength && Character.isLetter(chars[0]) &&
                Character.isLetter(chars[1]) && Character.isLetter(chars[2]) &&
                Character.isDigit(chars[3]) && Character.isDigit(chars[4]) &&
                Character.isDigit(chars[5]) && Character.isDigit(chars[6])) {
            return true;
        }
        return false;
    }

    public boolean elliotFrischAllLLettersOptimized(char[] chars, int from, int to) {
        for (int i = from; i < to; i++) {
            if (!Character.isLetter(chars[i])) {
                return false;
            }
        }
        return true;
    }

    public boolean elliotFrischAllDigitsOptimized(char[] chars, int from, int to) {
        for (int i = from; i < to; i++) {
            if (!Character.isDigit(chars[i])) {
                return false;
            }
        }
        return true;
    }

    @Benchmark
    public boolean elliotFrischOptimized() {
        if (this.sample.length() != this.goodLength) {
            return false;
        }

        char[] chars = this.sample.toCharArray();

        return elliotFrischAllLLettersOptimized(chars, 0, 3)
                && elliotFrischAllDigitsOptimized(chars, 3, 7);
    }

    public boolean elliotFrischAllLLetters(String s) {
        for (char ch : s.toCharArray()) {
            if (!Character.isLetter(ch)) {
                return false;
            }
        }
        return true;
    }

    public boolean elliotFrischAllDigits(String s) {
        for (char ch : s.toCharArray()) {
            if (!Character.isDigit(ch)) {
                return false;
            }
        }
        return true;
    }

    @Benchmark
    public boolean elliotFrisch() {
        return this.sample.length() == this.goodLength
                && elliotFrischAllLLetters(this.sample.substring(0, 3))
                && elliotFrischAllDigits(this.sample.substring(3));
    }

    @Benchmark
    public boolean elliotFrischRegex() {
        return this.p.matcher(this.sample).matches();
    }

    @Benchmark
    public boolean aomine() {
        return this.sample.length() == this.goodLength &&
                this.sample.substring(0, 3).codePoints()
                        .allMatch(Character::isLetter)
                && this.sample.substring(3, 7).codePoints()
                        .allMatch(Character::isDigit);
    }

    @Benchmark
    public boolean pradipforever() {
        if (this.sample.substring(0, 3).chars().allMatch(Character::isLetter)
                && this.sample.substring(3).chars().allMatch(Character::isDigit)) {
            return true;
        }
        return false;
    }

    public boolean mrBParseInt(String s) {
        try {
            Integer.parseInt(s);
            return true;
        } catch (NumberFormatException ex) {
            return false;
        }
    }

    @Benchmark
    public boolean mrB() {
        return this.sample.length() == this.goodLength
                && elliotFrischAllLLetters(this.sample.substring(0, 3))
                && mrBParseInt(this.sample.substring(3));
    }

}

Answer 4

我将编写两个实用程序方法；一个检查给定的String是否全部为字母，另一个检查给定的String是否全部为数字。然后通过使用String.substring(int, int)来比较两个相关子字符串来调用这两个方法。喜欢，

private static boolean allLetters(String s) {
    for (char ch : s.toCharArray()) {
        if (!Character.isLetter(ch)) {
            return false;
        }
    }
    return true;
}

private static boolean allDigits(String s) {
    for (char ch : s.toCharArray()) {
        if (!Character.isDigit(ch)) {
            return false;
        }
    }
    return true;
}

public static void main(String[] args) {
    // ...
    String s = "ABC1234";
    if (s.length() == 7 && allLetters(s.substring(0, 3)) && allDigits(s.substring(3))) {
        list.add(s);
    }
}

但是，在实际代码中，正则表达式仍然更好-

Pattern p = Pattern.compile("\\D{3}\\d{4}");
if (p.matcher(s).matches()) {
    // ...
}

Answer 5

您唯一可以添加的是一开始的length-check：

if (chars.length == 7 && Character.isLetter(chars[0]) &&
        Character.isLetter(chars[1]) && Character.isLetter(chars[2]) &&
        Character.isDigit(chars[3]) && Character.isDigit(chars[4]) &&
        Character.isDigit(chars[5]) && Character.isDigit(chars[6])) {
    //..
}

使用一些循环不会更有效，因为&&已经短路，并且会在发现false布尔值时停止

Answer 6

您当前的方法很好，因为您不能使用正则表达式，但这是另一种方法：

 boolean isValid = sample.length() == 7 &&
                sample.substring(0, 3).codePoints()
                        .allMatch(Character::isLetter)
                && sample.substring(3, 7).codePoints()
                .allMatch(Character::isDigit);

Answer 7

您可以使用Integer.parseInt(sample.substring(3,7);一起检查最后4个字母，不过我想不出更快的字母。如果Integer.parseInt不是数字，则会抛出NumberFormatException，因此请在try块中进行

如何不使用正则表达式检查字符串格式？

7 个答案:

结果

图表

一些注释和改进

Tl; dr和结论

代码