将包含ascii字符串的byte []快速转换为int / double / date等,而不使用新的String

时间:2012-02-07 07:21:20

标签: java parsing bytebuffer

我得到FIX消息字符串(ASCII)作为ByteBuffer。我解析标记值对并将值存储为树图中的基本对象,标记为键。所以我需要将byte []值转换为int / double / date等,具体取决于它的类型。

最简单的方法是创建新的String并将其传递给标准转换器函数。例如

int convertToInt(byte[] buffer, int offset, int length)
{
  String valueStr = new String(buffer, offset, length);
  return Integer.parseInt(valueStr);
}

据我所知,在Java中,创建新对象非常便宜,仍然有办法将此ascii byte []直接转换为原始类型。我尝试过手写功能,但发现这很耗时,并没有带来更好的性能。

是否有任何第三方图书馆这样做,最重要的是值得做什么?

4 个答案:

答案 0 :(得分:2)

  

最重要的是值得做什么?

几乎肯定不是 - 你应该测量一下这个 是否是性能瓶颈,然后再做出重大努力来缓解它。

你现在的表现如何?它需要什么? (“尽可能快”不是一个好目标,或者你永远不会停止 - 当你可以说你“完成”时锻炼出去。)

配置代码 - 字符串创建中的真正问题是什么?检查你的垃圾收集频率等(再次使用分析器)。

每种解析类型可能具有不同的特征。例如,对于解析整数,如果您在很长一段时间内发现了一个数字,那么可能想要特殊情况:

if (length == 1)
{
    char c = buffer[index];
    if (c >= '0' && c <= '9')
    {
        return c - '0';
    }
    // Invalid - throw an exception or whatever
}

...但是在你沿着这条路走下去之前,请检查这种情况。对从未真正实现的特定优化应用大量检查会产生适得其反的效果。

答案 1 :(得分:2)

与Jon同意,但是当处理许多FIX消息时,这很快就会增加。 下面的方法将允许空格填充数字。如果你需要处理小数,那么代码会略有不同。两种方法之间的速度差异为11. ConvertToLong产生0个GC。以下代码位于c#:

///<summary>
///Converts a byte[] of characters that represent a number into a .net long type. Numbers can be padded from left
/// with spaces.
///</summary>
///<param name="buffer">The buffer containing the number as characters</param>
///<param name="startIndex">The startIndex of the number component</param>
///<param name="endIndex">The EndIndex of the number component</param>
///<returns>The price will be returned as a long from the ASCII characters</returns>
public static long ConvertToLong(this byte[] buffer, int startIndex, int endIndex)
{
    long result = 0;
    for (int i = startIndex; i <= endIndex; i++)
    {
        if (buffer[i] != 0x20)
        {
            // 48 is the decimal value of the '0' character. So to convert the char value
            // of an int to a number we subtract 48. e.g '1' = 49 -48 = 1
            result = result * 10 + (buffer[i] - 48);
        }
    }
    return result;
}

/// <summary>
/// Same as above but converting to string then to long
/// </summary>
public static long ConvertToLong2(this byte[] buffer, int startIndex, int endIndex)
{
    for (int i = startIndex; i <= endIndex; i++)
    {
        if (buffer[i] != SpaceChar)
        {
            return long.Parse(System.Text.Encoding.UTF8.GetString(buffer, i, (endIndex - i) + 1));
        }
    }
    return 0;
}

[Test]
public void TestPerformance(){
    const int iterations = 200 * 1000;
    const int testRuns = 10;
    const int warmUp = 10000;
    const string number = "    123400";
    byte[] buffer = System.Text.Encoding.UTF8.GetBytes(number);

    double result = 0;
    for (int i = 0; i < warmUp; i++){
        result = buffer.ConvertToLong(0, buffer.Length - 1);
    }
    for (int testRun = 0; testRun < testRuns; testRun++){
        Stopwatch sw = new Stopwatch();
        sw.Start();
        for (int i = 0; i < iterations; i++){
            result = buffer.ConvertToLong(0, buffer.Length - 1);
        }
        sw.Stop();
        Console.WriteLine("Test {4}: {0} ticks, {1}ms, 1 conversion takes = {2}μs or {3}ns. GCs: {5}", sw.ElapsedTicks,
            sw.ElapsedMilliseconds, (((decimal) sw.ElapsedMilliseconds)/((decimal) iterations))*1000,
            (((decimal) sw.ElapsedMilliseconds)/((decimal) iterations))*1000*1000, testRun,
            GC.CollectionCount(0) + GC.CollectionCount(1) + GC.CollectionCount(2));
    }
}
RESULTS
ConvertToLong:
Test 0: 9243 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
Test 1: 8339 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
Test 2: 8425 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
Test 3: 8333 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
Test 4: 8332 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
Test 5: 8331 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
Test 6: 8409 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
Test 7: 8334 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
Test 8: 8335 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
Test 9: 8331 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
ConvertToLong2:
Test 0: 109067 ticks, 55ms, 1 conversion takes = 0.275000μs or 275.000000ns. GCs: 4
Test 1: 109861 ticks, 56ms, 1 conversion takes = 0.28000μs or 280.00000ns. GCs: 8
Test 2: 102888 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 9
Test 3: 105164 ticks, 53ms, 1 conversion takes = 0.265000μs or 265.000000ns. GCs: 10
Test 4: 104083 ticks, 53ms, 1 conversion takes = 0.265000μs or 265.000000ns. GCs: 11
Test 5: 102756 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 13
Test 6: 102219 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 14
Test 7: 102086 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 15
Test 8: 102672 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 17
Test 9: 102025 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 18

答案 2 :(得分:1)

看看ByteBuffer。它具有执行此操作的功能,包括处理字节顺序(字节顺序)。

答案 3 :(得分:1)

一般来说,我没有偏好粘贴这样的代码,但无论如何,100行怎么做(生产代码) 我不建议使用它,但有一些参考代码它很好(通常)

package t1;

import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;

public class IntParser {
    final static byte[] digits = {
        '0' , '1' , '2' , '3' , '4' , '5' ,
        '6' , '7' , '8' , '9' , 'a' , 'b' ,
        'c' , 'd' , 'e' , 'f' , 'g' , 'h' ,
        'i' , 'j' , 'k' , 'l' , 'm' , 'n' ,
        'o' , 'p' , 'q' , 'r' , 's' , 't' ,
        'u' , 'v' , 'w' , 'x' , 'y' , 'z'
    };

    static boolean isDigit(byte b) {
    return b>='0' &&  b<='9';
  }

    static int digit(byte b){
        //negative = error

        int result  = b-'0';
        if (result>9)
            result = -1;
        return result;
    }

    static NumberFormatException forInputString(ByteBuffer b){
        byte[] bytes=new byte[b.remaining()];
        b.get(bytes);
        try {
            return new NumberFormatException("bad integer: "+new String(bytes, "8859_1"));
        } catch (UnsupportedEncodingException e) {
            throw new RuntimeException(e);
        }
    }
    public static int parseInt(ByteBuffer b){
        return parseInt(b, 10, b.position(), b.limit());
    }
    public static int parseInt(ByteBuffer b, int radix, int i, int max) throws NumberFormatException{
        int result = 0;
        boolean negative = false;


        int limit;
        int multmin;
        int digit;      

        if (max > i) {
            if (b.get(i) == '-') {
                negative = true;
                limit = Integer.MIN_VALUE;
                i++;
            } else {
                limit = -Integer.MAX_VALUE;
            }
            multmin = limit / radix;
            if (i < max) {
                digit = digit(b.get(i++));
                if (digit < 0) {
                    throw forInputString(b);
                } else {
                    result = -digit;
                }
            }
            while (i < max) {
                // Accumulating negatively avoids surprises near MAX_VALUE
                digit = digit(b.get(i++));
                if (digit < 0) {
                    throw forInputString(b);
                }
                if (result < multmin) {
                    throw forInputString(b);
                }
                result *= radix;
                if (result < limit + digit) {
                    throw forInputString(b);
                }
                result -= digit;
            }
        } else {
            throw forInputString(b);
        }
        if (negative) {
            if (i > b.position()+1) {
                return result;
            } else {    /* Only got "-" */
                throw forInputString(b);
            }
        } else {
            return -result;
        }
    }

}