在java中,将子字符串转换为整数而不使用Integer.parseInt的最快方法是什么?我想知道是否有办法避免使用parseInt,因为它需要我创建一个临时字符串,它是我想要转换的子字符串的副本。
"abcd12345abcd" <-- just want chars 4..8 converted.
我想避免通过不使用子字符串来创建一个新的临时字符串。
如果我要自己滚动,有没有办法避免我在String.charAt(int)
内看到的数组边界检查的开销?
修改
我从大家那里得到了很多好的信息......以及关于预优化的常见警告:)基本答案是没有比String.charAt或char []更好的了。不安全的代码即将推出(可能)。编译器可能可以优化[]上的过度范围检查。
我做了一些基准测试,由于不使用子字符串和滚动特定的parseInt而导致的节省是巨大的。
调用Integer.parseInt(str.substring(4,8))的成本的32%来自子字符串。这不包括后续的垃圾收集成本。
Integer.parseInt旨在处理非常广泛的输入。通过使用charAt滚动我自己的parseInt(特定于我们的数据的样子),我能够比子串方法实现6倍的加速。
尝试char []的评论会导致性能提升约7倍。但是,您的数据必须已经在char []中,因为转换为char数组的成本很高。对于解析文本,似乎完全保留在char []中并编写一些函数来比较字符串。
基准测试结果(越小越快):
parseInt(substring) 23731665
parseInt(string) 16859226
Atoi1 7116633
Atoi2 4514031
Atoi3 char[] 4135355
Atoi4 char[] 3503638
Atoi5 char[] 5485495
GetNumber1 8666020
GetNumber2 5951939
在基准测试期间,我还尝试使用Inline开启和关闭,并验证编译器正确地内联所有内容。
如果有人关心,这是我的基准代码......
package javaatoi;
import java.lang.management.GarbageCollectorMXBean;
import java.lang.management.ManagementFactory;
public class JavaAtoi {
static int cPasses = 10;
static int cTests = 9;
static int cIter = 0x100000;
static int cString = 0x100;
static int fStringMask = cString - 1;
public static void main(String[] args) throws InterruptedException {
// setup test data. Use a large enough set that the compiler
// wont unroll the loop. Use a small enough set that we are
// keeping the data in L2. I don't want to measure memory loads.
String[] a = new String[cString];
for (int i = 0 ; i< cString ; i+=4) {
// leading zeros will occur, so add one number with one.
a[i+0] = "abcd01234abcd";
a[i+1] = "abcd1234abcd";
a[i+2] = "abcd1234abcd";
a[i+3] = "abcd1234abcd";
}
// array of pre-substringed stuff
String[] a1 = new String[cString];
for (int i=0 ; i< cString ; ++i)
a1[i]= a[i].substring(4,8);
// char array version of the strings
char[][] b = new char[cString][];
for (int i =0 ; i<cString ; ++i)
b[i] = a[i].toCharArray();
// array to hold times for each test for each pass
long[][] t = new long[cPasses][cTests];
// multiple dry runs to let the compiler optimize the functions
for (int i=0 ; i<50 ; ++i) {
t[0][0] = TestParseInt1(a)[0];
t[0][1] = TestParseInt2(a1)[0];
t[0][2] = TestAtoi1(a)[0];
t[0][3] = TestAtoi2(a)[0];
t[0][4] = TestAtoi3(b)[0];
t[0][5] = TestAtoi4(b)[0];
t[0][6] = TestAtoi5(b)[0];
t[0][7] = TestAtoi6(a)[0];
t[0][8] = TestAtoi7(a)[0];
}
// now do a bunch of tests
for (int i=0 ; i<cPasses ; ++i) {
t[i][0] = TestParseInt1(a)[0];
t[i][1] = TestParseInt2(a1)[0];
t[i][2] = TestAtoi1(a)[0];
t[i][3] = TestAtoi2(a)[0];
t[i][4] = TestAtoi3(b)[0];
t[i][5] = TestAtoi4(b)[0];
t[i][6] = TestAtoi5(b)[0];
t[i][7] = TestAtoi6(a)[0];
t[i][8] = TestAtoi7(a)[0];
}
// setup mins - we only care about min time.
t[cPasses-1] = new long[cTests];
for (int i=0 ; i<cTests ; ++i)
t[cPasses-1][i] = 999999999;
for (int j=0 ; j<cTests ; ++j) {
for (int i=0 ; i<cPasses-1 ; ++i) {
long n = t[i][j];
if (n < t[cPasses-1][j])
t[cPasses-1][j] = n;
}
}
// output string
String s = new String();
for (int j=0 ; j<cTests ; ++j) {
for (int i=0 ; i<cPasses ; ++i) {
long n = t[i][j];
s += String.format("%9d", n);
}
s += "\n";
}
System.out.println(s);
// if you comment out the part of TestParseInt1 you can sorta see the
// gc cost.
System.gc(); // Trying to get an idea of the total substring cost
Thread.sleep(1000); // i dunno if this matters. Seems like the gc takes a little while. Not real exact...
long collectionTime = 0;
for (GarbageCollectorMXBean garbageCollectorMXBean : ManagementFactory.getGarbageCollectorMXBeans()) {
long n = garbageCollectorMXBean.getCollectionTime();
if (n > 0)
collectionTime += n;
}
System.out.println(collectionTime*1000000);
}
// you have to put each test function in its own wrapper to
// get the compiler to fairly optimize each test.
// I also made sure I incremented n and used a large # of string
// to make it harder for the compiler to eliminate the loops.
static long[] TestParseInt1(String[] a) {
long n = 0;
long startTime = System.nanoTime();
// comment this out to get an idea of gc cost without the substrings
// then uncomment to get idea of gc cost with substrings
for (int i=0 ; i<cIter ; ++i)
n += Integer.parseInt(a[i&fStringMask].substring(4,8));
return new long[] { System.nanoTime() - startTime, n };
}
static long[] TestParseInt2(String[] a) {
long n = 0;
long startTime = System.nanoTime();
for (int i=0 ; i<cIter ; ++i)
n += Integer.parseInt(a[i&fStringMask]);
return new long[] { System.nanoTime() - startTime, n };
}
static long[] TestAtoi1(String[] a) {
long n = 0;
long startTime = System.nanoTime();
for (int i=0 ; i<cIter ; ++i)
n += Atoi1(a[i&fStringMask], 4, 4);
return new long[] { System.nanoTime() - startTime, n };
}
static long[] TestAtoi2(String[] a) {
long n = 0;
long startTime = System.nanoTime();
for (int i=0 ; i<cIter ; ++i)
n += Atoi2(a[i&fStringMask], 4, 4);
return new long[] { System.nanoTime() - startTime, n };
}
static long[] TestAtoi3(char[][] a) {
long n = 0;
long startTime = System.nanoTime();
for (int i=0 ; i<cIter ; ++i)
n += Atoi3(a[i&fStringMask], 4, 4);
return new long[] { System.nanoTime() - startTime, n };
}
static long[] TestAtoi4(char[][] a) {
long n = 0;
long startTime = System.nanoTime();
for (int i=0 ; i<cIter ; ++i)
n += Atoi4(a[i&fStringMask], 4, 4);
return new long[] { System.nanoTime() - startTime, n };
}
static long[] TestAtoi5(char[][] a) {
long n = 0;
long startTime = System.nanoTime();
for (int i=0 ; i<cIter ; ++i)
n += Atoi5(a[i&fStringMask], 4, 4);
return new long[] { System.nanoTime() - startTime, n };
}
static long[] TestAtoi6(String[] a) {
long n = 0;
long startTime = System.nanoTime();
for (int i=0 ; i<cIter ; ++i)
n += Atoi6(a[i&fStringMask], 4, 4);
return new long[] { System.nanoTime() - startTime, n };
}
static long[] TestAtoi7(String[] a) {
long n = 0;
long startTime = System.nanoTime();
for (int i=0 ; i<cIter ; ++i)
n += Atoi7(a[i&fStringMask], 4, 4);
return new long[] { System.nanoTime() - startTime, n };
}
static int Atoi1(String s, int i0, int cb) {
int n = 0;
boolean fNeg = false; // for unsigned T, this assignment is removed by the optimizer
int i = i0;
int i1 = i + cb;
int ch;
// skip leading crap, scan for -
for ( ; i<i1 && ((ch = s.charAt(i)) > '9' || ch <= '0') ; ++i) {
if (ch == '-')
fNeg = !fNeg;
}
// here is the loop to process the valid number chars.
for ( ; i<i1 ; ++i)
n = n*10 + (s.charAt(i) - '0');
return (fNeg) ? -n : n;
}
static int Atoi2(String s, int i0, int cb) {
int n = 0;
for (int i=i0 ; i<i0+cb ; ++i) {
char ch = s.charAt(i);
n = n*10 + ((ch <= '0') ? 0 : ch - '0');
}
return n;
}
static int Atoi3(char[] s, int i0, int cb) {
int n = 0, i = i0, i1 = i + cb;
// skip leading spaces or zeros
for ( ; i<i1 && s[i] <= '0' ; ++i) { }
// loop to process the valid number chars.
for ( ; i<i1 ; ++i)
n = n*10 + (s[i] - '0');
return n;
}
static int Atoi4(char[] s, int i0, int cb) {
int n = 0;
// loop to process the valid number chars.
for (int i=i0 ; i<i0+cb ; ++i) {
char ch = s[i];
n = n*10 + ((ch <= '0') ? 0 : ch - '0');
}
return n;
}
static int Atoi5(char[] s, int i0, int cb) {
int ch, n = 0, i = i0, i1 = i + cb;
// skip leading crap or zeros
for ( ; i<i1 && ((ch = s[i]) <= '0' || ch > '9') ; ++i) { }
// loop to process the valid number chars.
for ( ; i<i1 && (ch = s[i] - '0') >= 0 && ch <= 9 ; ++i)
n = n*10 + ch;
return n;
}
static int Atoi6(String data, int start, int length) {
int number = 0;
for (int i = start; i <= start + length; i++) {
if (Character.isDigit(data.charAt(i))) {
number = (number * 10) + (data.charAt(i) - 48);
}
}
return number;
}
static int Atoi7(String data, int start, int length) {
int number = 0;
for (int i = start; i <= start + length; i++) {
char ch = data.charAt(i);
if (ch >= '0' && ch <= '9') {
number = (number * 10) + (ch - 48);
}
}
return number;
}
}
答案 0 :(得分:2)
对不起......如果没有以下任何一种方法,你真的无法完成你想做的事情:
String
或String
,然后将其解析为int
。 Java不像C ++; a String
isn't the same as a char[]
正如我之前提到的,对String
返回String
的所有操作都会生成一个新的 String
实例,所以你不可避免地要处理String
处于中间状态。
这里的主要问题是,如果你实际上知道子串边界,那么使用它们来完成你需要的东西。
Do not worry about optimization,直到您可以推断出这部分代码是最大的瓶颈。即便如此,坚持有意义的优化;您可以将整个String
转换为IntStream
,并仅解析Java 8中实际数字的元素。
有可能这段代码不会成为主要的性能损失,过早地优化它会导致你走上非常非常痛苦的道路。
实际上,您可以获得的最接近的(使用Java 8&Stream
API)是在Character
和String
之间进行一些转换,但这仍然会产生中间转换String
S:
System.out.println(Integer.parseInt("abcd12345abcd".chars()
.filter(Character::isDigit)
.mapToObj(c -> (char) c)
.map(Object::toString)
.reduce("", String::concat)));
... 远更难以阅读和理解:
System.out.println(Integer.parseInt("abcd12345abcd".substring(4, 9)));
答案 1 :(得分:1)
看到你想要模仿Java中的C / C ++行为,在做了一些谷歌搜索之后,我遇到了http://ssw.jku.at/Research/Papers/Wuerthinger07/ 你可能会感兴趣的。
阵列边界检查Java HotSpot™客户端编译器的消除 摘要
每当访问数组元素时,Java虚拟机都会执行 比较指令以确保索引值在有效范围内 界限。这降低了Java程序的执行速度。排列 边界检查消除识别此类检查的情况 是多余的,可以删除。我们提出了一个数组边界检查 基于静态的Java HotSpot™VM消除算法 在即时编译器中进行分析。
该算法适用于静态单一的中间表示 赋值表单并维护索引表达式的条件。它 如果可以证明它们永远不会失败,则完全删除边界检查。 只要有可能,它就会将边界检查移出循环。静电 检查的数量保持不变,但可能会在循环内进行检查 更频繁地执行。如果这样的检查失败,则执行 程序回退到解释模式,避免了问题 异常被抛到错误的地方。
评估显示接近理论最大值的加速 科学的SciMark基准套件(平均40%)。算法 还提高了SPECjvm98基准测试套件的执行速度 (平均为2%,最高为12%)。
此处找到完整的研究论文http://www.ssw.uni-linz.ac.at/Research/Papers/Wuerthinger07/Wuerthinger07.pdf
由于您知道字符串中数字的开头和长度,因此您仍然可以“滚动自己的”而不进行边界检查。无论哪种方式,你将不得不做一些提取来获得数字。是否提取到临时字符串然后转换它,或者即时转换字符。
public static void main(String[] args) throws Exception {
String data = "abcd12345abcd";
System.out.println(getNumber(data, 4, 5));
}
public static int getNumber(String data, int start, int length)
{
int number = 0;
for (int i = start; i <= start + length; i++) {
char c = data.charAt(i);
if ('0' <= c && c <= '9') {
number = (number * 10) + (c - 48);
}
}
return number;
}
结果:
12345
使用String.replaceAll()
删除不需要的内容,然后转换/解析剩下的内容。
public static void main(String[] args) throws Exception {
String data = "abcd12345abcd";
int myInt = Integer.valueOf(data.replaceAll("[^0-9]", ""));
System.out.println(myInt);
}
结果:
12345
答案 2 :(得分:0)
请记住,这不是我通常会如何处理此问题(选择使用正则表达式来过滤掉非数字)。但是,下面的解决方案不会创建单独的字符串(除了字符数组之外)。
public static int getIntegerFromString(String s) {
int multiplier, result = 0;
boolean inIntegers = false, beforeInteger = true;
char[] chars = s.toCharArray();
char c;
// Iterate through each character, starting at the end
for(int i = chars.length - 1; i >= 0; i--) {
c = chars[i];
if(Character.isDigit(c)) {
// The char is a digit, so we either increase the multiplier (if the previous char was also a digit) or prepare our environment
if(inIntegers) {
multiplier *= 10;
}
else {
inIntegers = true;
beforeInteger = false;
multiplier = 1;
}
result += multiplier * Character.getNumericValue(c);
}
else if(inIntegers) {
// We're done with the sequence of integers. Stop the for-loop.
break;
}
}
return result;
}
[chris@localhost:Projects]$ java Test 3949
3949
[chris@localhost:Projects]$ java Test 3949G
3949
[chris@localhost:Projects]$ java Test E3949G
3949
答案 3 :(得分:-2)
您可以尝试查看sun.misc.Unsafe。我实际上从未使用它,但是如果你想避免边界检查等,那么可以使用这个(未记录的)类来做到这一点。
请参阅https://stackoverflow.com/questions/5574241/how-can-sun-misc-unsafe-be-used-in-the-real-world
编辑: 关于删除Java 9中的Unsafe(作者认为,由于许多库使用它,因此删除它不是一个好主意):http://blog.dripstat.com/removal-of-sun-misc-unsafe-a-disaster-in-the-making/
也可以使用JNI,但我想用普通方法调用它会导致大量开销(如果已将边界检查定义为开销)
以下链接也可能很有趣,作者还说,经常调用但运行时间较短的方法难以优化: https://thinkingandcomputing.com/2014/03/30/eliminating-jni-overhead/
您可以通过以下方式获取不安全信息:
int[] x = new int[]{1,2,3,4};
final int offset = unsafe.arrayBaseOffset(int[].class);
final int arrayIndexScale = unsafe.arrayIndexScale(int[].class);
for (int i=0;i<4;i++){
unsafe.putInt(x, offset+arrayIndexScale*i, 11*(i+1));
}
System.out.println(Arrays.toString(x));
有关详细信息,请参阅:http://mishadoff.com/blog/java-magic-part-4-sun-dot-misc-dot-unsafe/
不安全数组的示例:
Output: [11, 22, 33, 44]
uint16_t ReverseInt16(uint16_t nonreversed) { uint16_t reversed = 0; reversed |= (nonreversed & 1 << 15) << 0; //check if bit 15 of nonreversed int is 1, if yes, write 1 to position 0, else write 0 to position 0 reversed |= (nonreversed & 1 << 14) << 1; reversed |= (nonreversed & 1 << 13) << 2; reversed |= (nonreversed & 1 << 12) << 3; reversed |= (nonreversed & 1 << 11) << 4; reversed |= (nonreversed & 1 << 10) << 5; reversed |= (nonreversed & 1 << 9) << 6; reversed |= (nonreversed & 1 << 8) << 7; reversed |= (nonreversed & 1 << 7) << 8; reversed |= (nonreversed & 1 << 6) << 9; reversed |= (nonreversed & 1 << 5) << 10; reversed |= (nonreversed & 1 << 4) << 11; reversed |= (nonreversed & 1 << 3) << 12; reversed |= (nonreversed & 1 << 2) << 13; reversed |= (nonreversed & 1 << 1) << 14; reversed |= (nonreversed & 1 << 0) << 15; return reversed; }