从byte []创建StringBuilder

时间:2012-06-20 07:51:16

标签: java memory

有没有办法从StringBuilder创建byte[]

我希望使用StringBuilder来提高内存使用率,但我首先使用的是byte[],因此我必须从String创建byte[],然后创建来自StringBuilder的{​​{1}}我并未将此解决方案视为最佳解决方案。

由于

2 个答案:

答案 0 :(得分:13)

基本上,您最好的选择似乎是直接使用CharsetDecoder

以下是:

byte[] srcBytes = getYourSrcBytes();

//Whatever charset your bytes are endoded in
Charset charset = Charset.forName("UTF-8");
CharsetDecoder decoder = charset.newDecoder();

//ByteBuffer.wrap simply wraps the byte array, it does not allocate new memory for it
ByteBuffer srcBuffer = ByteBuffer.wrap(srcBytes);
//Now, we decode our srcBuffer into a new CharBuffer (yes, new memory allocated here, no can do)
CharBuffer resBuffer = decoder.decode(srcBuffer);

//CharBuffer implements CharSequence interface, which StringBuilder fully support in it's methods
StringBuilder yourStringBuilder = new StringBuilder(resBuffer);

<强>增加:

经过一些测试后,似乎简单的new String(bytes)要快得多,似乎没有简单的方法可以让它更快。这是我跑的测试:

import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.text.ParseException;

public class ConsoleMain {
    public static void main(String[] args) throws IOException, ParseException {
        StringBuilder sb1 = new StringBuilder("abcdefghijklmnopqrstuvwxyz");
        for (int i=0;i<19;i++) {
            sb1.append(sb1);
        }
        System.out.println("Size of buffer: "+sb1.length());
        byte[] src = sb1.toString().getBytes("UTF-8");
        StringBuilder res;

        long startTime = System.currentTimeMillis();
        res = testStringConvert(src);
        System.out.println("Conversion using String time (msec): "+(System.currentTimeMillis()-startTime));
        if (!res.toString().equals(sb1.toString())) {
            System.err.println("Conversion error");
        }

        startTime = System.currentTimeMillis();
        res = testCBConvert(src);
        System.out.println("Conversion using CharBuffer time (msec): "+(System.currentTimeMillis()-startTime));
        if (!res.toString().equals(sb1.toString())) {
            System.err.println("Conversion error");
        }
    }

    private static StringBuilder testStringConvert(byte[] src) throws UnsupportedEncodingException {
        String s = new String(src, "UTF-8");
        StringBuilder b = new StringBuilder(s);
        return b;
    }

    private static StringBuilder testCBConvert(byte[] src) throws CharacterCodingException {
        Charset charset = Charset.forName("UTF-8");
        CharsetDecoder decoder = charset.newDecoder();
        ByteBuffer srcBuffer = ByteBuffer.wrap(src);
        CharBuffer resBuffer = decoder.decode(srcBuffer);
        StringBuilder b = new StringBuilder(resBuffer);
        return b;
    }
}

结果:

Size of buffer: 13631488
Conversion using String time (msec): 91
Conversion using CharBuffer time (msec): 252

在IDEONE上修改(耗费更少内存)版本:Here

答案 1 :(得分:4)

如果它是您想要的简短陈述,则无法绕过String之间的步骤。为了方便起见,String构造函数混合了转换和对象构造,但是在StringBuilder中没有这样的便利构造函数。

如果它是您感兴趣的性能,那么您可以通过使用以下内容来避免使用中间String对象:

new StringBuilder(Charset.forName(charsetName).decode(ByteBuffer.wrap(inBytes)))

如果您希望能够微调性能,可以自己控制解码过程。例如,您可能希望避免使用太多内存,方法是使用averageCharsPerByte作为对所需内存量的估计。如果估计值太短,则可以使用生成的StringBuilder来累积所有部分,而不是调整缓冲区的大小。

CharsetDecoder cd = Charset.forName(charsetName).newDecoder();
cd.onMalformedInput(CodingErrorAction.REPLACE);
cd.onUnmappableCharacter(CodingErrorAction.REPLACE);
int lengthEstimate = Math.ceil(cd.averageCharsPerByte()*inBytes.length) + 1;
ByteBuffer inBuf = ByteBuffer.wrap(inBytes);
CharBuffer outBuf = CharBuffer.allocate(lengthEstimate);
StringBuilder out = new StringBuilder(lengthEstimate);
CoderResult cr;
while (true) {
    cr = cd.decode(inBuf, outBuf, true);
    out.append(outBuf);
    outBuf.clear();
    if (cr.isUnderflow()) break;
    if (!cr.isOverflow()) cr.throwException();
}
cr = cd.flush(outBuf);
if (!cr.isUnderflow()) cr.throwException();
out.append(outBuf);

我怀疑上面的代码在大多数应用程序中都是值得的。如果应用程序对性能感兴趣,它可能不应该处理StringBuilder,而是处理缓冲区级别的所有内容。