最佳正则表达式,用于在任意字符串中提取单个十进制数

时间:2011-09-09 14:38:02

标签: java regex optimization

假设目标字符串一方面是任意的,但另一方面保证包含一个十进制数字(1位或更多位数),我提出了以下常规正则表达式模式:

.*?(\d+).*?

因此,如果目标字符串是“(这是数字200)”,例如,Matcher.group(1)将包含该数字。

是否有更优化的正则表达式模式(或非正则表达式方法)来提取此数字?

“最佳”是指最快(可能具有最少的CPU周期)。仅限Java。

2 个答案:

答案 0 :(得分:5)

Just(\ d +)绰绰有余。

答案 1 :(得分:2)

我相信正则表达式和parseInt对你来说效果会很好。然而,为了您的兴趣,我将它与一个简单的循环进行了比较。

public static final Pattern DIGITS = Pattern.compile("(\\d+)");

public static void main(String[] args) {
  String text = "Some text before a number 123456 and some after";
  for (int i = 0; i < 5; i++) {
    timeRegex(text);
    timeLooping(text);
  }
}

private static int timeLooping(String text) {
  int ret = 0;
  final int runs = 1000;
  long start = System.nanoTime();
  for (int r = 0; r < runs; r++) {
    for (int i = 0; i < text.length(); i++) {
      char ch = text.charAt(i);
      if (ch <= '9' && ch >= '0')
        ret = ret * 10 + ch - '0';
      else if (ret > 0)
        break;
    }
  }
  long time = System.nanoTime() - start;
  System.out.printf("Took %,d ns to use a loop on average%n", time / runs);
  return ret;
}

private static int timeRegex(String text) {
  int ret = 0;
  final int runs = 1000;
  long start = System.nanoTime();
  for (int r = 0; r < runs; r++) {
    Matcher m = DIGITS.matcher(text);
    if (m.find())
      ret = Integer.parseInt(m.group());
  }
  long time = System.nanoTime() - start;
  System.out.printf("Took %,d ns to use a matcher on average%n", time / runs);
  return ret;
}

打印

Took 19,803 ns to use a matcher on average
Took 85 ns to use a loop on average
Took 12,411 ns to use a matcher on average
Took 83 ns to use a loop on average
Took 8,199 ns to use a matcher on average
Took 79 ns to use a loop on average
Took 11,156 ns to use a matcher on average
Took 104 ns to use a loop on average
Took 4,527 ns to use a matcher on average
Took 94 ns to use a loop on average