Question

假设目标字符串一方面是任意的，但另一方面保证包含一个十进制数字（1位或更多位数），我提出了以下常规正则表达式模式：

.*?(\d+).*?

因此，如果目标字符串是“（这是数字200）”，例如，Matcher.group(1)将包含该数字。

是否有更优化的正则表达式模式（或非正则表达式方法）来提取此数字？

“最佳”是指最快（可能具有最少的CPU周期）。仅限Java。

Answer 1

Just（\ d +）绰绰有余。

Answer 2

我相信正则表达式和parseInt对你来说效果会很好。然而，为了您的兴趣，我将它与一个简单的循环进行了比较。

public static final Pattern DIGITS = Pattern.compile("(\\d+)");

public static void main(String[] args) {
  String text = "Some text before a number 123456 and some after";
  for (int i = 0; i < 5; i++) {
    timeRegex(text);
    timeLooping(text);
  }
}

private static int timeLooping(String text) {
  int ret = 0;
  final int runs = 1000;
  long start = System.nanoTime();
  for (int r = 0; r < runs; r++) {
    for (int i = 0; i < text.length(); i++) {
      char ch = text.charAt(i);
      if (ch <= '9' && ch >= '0')
        ret = ret * 10 + ch - '0';
      else if (ret > 0)
        break;
    }
  }
  long time = System.nanoTime() - start;
  System.out.printf("Took %,d ns to use a loop on average%n", time / runs);
  return ret;
}

private static int timeRegex(String text) {
  int ret = 0;
  final int runs = 1000;
  long start = System.nanoTime();
  for (int r = 0; r < runs; r++) {
    Matcher m = DIGITS.matcher(text);
    if (m.find())
      ret = Integer.parseInt(m.group());
  }
  long time = System.nanoTime() - start;
  System.out.printf("Took %,d ns to use a matcher on average%n", time / runs);
  return ret;
}

打印

Took 19,803 ns to use a matcher on average
Took 85 ns to use a loop on average
Took 12,411 ns to use a matcher on average
Took 83 ns to use a loop on average
Took 8,199 ns to use a matcher on average
Took 79 ns to use a loop on average
Took 11,156 ns to use a matcher on average
Took 104 ns to use a loop on average
Took 4,527 ns to use a matcher on average
Took 94 ns to use a loop on average

最佳正则表达式，用于在任意字符串中提取单个十进制数

2 个答案: