假设目标字符串一方面是任意的,但另一方面保证包含一个十进制数字(1位或更多位数),我提出了以下常规正则表达式模式:
.*?(\d+).*?
因此,如果目标字符串是“(这是数字200)”,例如,Matcher.group(1)
将包含该数字。
是否有更优化的正则表达式模式(或非正则表达式方法)来提取此数字?
“最佳”是指最快(可能具有最少的CPU周期)。仅限Java。
答案 0 :(得分:5)
Just(\ d +)绰绰有余。
答案 1 :(得分:2)
我相信正则表达式和parseInt对你来说效果会很好。然而,为了您的兴趣,我将它与一个简单的循环进行了比较。
public static final Pattern DIGITS = Pattern.compile("(\\d+)");
public static void main(String[] args) {
String text = "Some text before a number 123456 and some after";
for (int i = 0; i < 5; i++) {
timeRegex(text);
timeLooping(text);
}
}
private static int timeLooping(String text) {
int ret = 0;
final int runs = 1000;
long start = System.nanoTime();
for (int r = 0; r < runs; r++) {
for (int i = 0; i < text.length(); i++) {
char ch = text.charAt(i);
if (ch <= '9' && ch >= '0')
ret = ret * 10 + ch - '0';
else if (ret > 0)
break;
}
}
long time = System.nanoTime() - start;
System.out.printf("Took %,d ns to use a loop on average%n", time / runs);
return ret;
}
private static int timeRegex(String text) {
int ret = 0;
final int runs = 1000;
long start = System.nanoTime();
for (int r = 0; r < runs; r++) {
Matcher m = DIGITS.matcher(text);
if (m.find())
ret = Integer.parseInt(m.group());
}
long time = System.nanoTime() - start;
System.out.printf("Took %,d ns to use a matcher on average%n", time / runs);
return ret;
}
打印
Took 19,803 ns to use a matcher on average
Took 85 ns to use a loop on average
Took 12,411 ns to use a matcher on average
Took 83 ns to use a loop on average
Took 8,199 ns to use a matcher on average
Took 79 ns to use a loop on average
Took 11,156 ns to use a matcher on average
Took 104 ns to use a loop on average
Took 4,527 ns to use a matcher on average
Took 94 ns to use a loop on average