Question

我有一个JPA AttributeConverter，用于将String转换为gzip压缩byte[]并返回。

转换为方法非常简单：

public byte[] convertToDatabaseColumn(String attribute) {
    try (ByteArrayOutputStream baos = new ByteArrayOutputStream();
         GZIPOutputStream gos = new GZIPOutputStream(baos)) {

        gos.write(attribute.getBytes(StandardCharsets.UTF_8));
        gos.finish();
        gos.flush();

        return baos.toByteArray();
    } catch (IOException e) {
        throw new UncheckedIOException(e);
    }
}

我的问题是转换方法：

public String convertToEntityAttribute(byte[] dbData) {
    try (GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(dbData));
         ByteArrayOutputStream baos = new ByteArrayOutputStream()) {

        byte[] buffer = new byte[1024];

        int len;
        while ((len = gis.read(buffer)) > 0) {
            baos.write(buffer, 0, len);
        }

        return new String(baos.toByteArray(), StandardCharsets.UTF_8);
    } catch (IOException e) {
        throw new UncheckedIOException(e);
    }
}

鉴于字节数组dbData已经在内存中，buffer中是否有任何一点？是不是更“高效”地逐字节地读取baos完全跳过buffer？

如果read方法正在进行底层操作系统调用，缓冲区将有意义，但它不在这里......

Answer 1

鉴于字节数组dbData已经在内存中，缓冲区中是否有任何点？

缓冲区用于提高性能。与一次读取一个字节相比，它们通常可以提高性能，这是唯一的选择。

它不是更多＆＃34;高性能＆＃34;逐字节地读取跳过缓冲区的baos？

如果您可以阅读那些您不会使用GZIPInputStream的字段，那么您的字节就会被压缩。

如果你想要效率和简单性，我建议你阅读/;直接从/向ByteArrayInput / OutputStream写入，而不使用byte []。

public byte[] convertToDatabaseColumn(String text) {
    try (ByteArrayOutputStream baos = new ByteArrayOutputStream();
         Writer out = new OutputStreamWriter(
                      new GZIPOutputStream(baos), StandardCharsets.UTF_8))) {
        out.write(text);
        out.close();    
        return baos.toByteArray();
    } catch (IOException e) {
        throw new UncheckedIOException(e);
    }
}

public String convertToEntityAttribute(byte[] dbData) {
    try (Reader reader = new InputStreamReader(
                         new GZIPInputStream(new ByteArrayInputStream(dbData)),
                         StandardCharsets.UTF_8) {

        char[] chars = new char[512];
        StringBuilder sb = new StringBuilder();
        for (int len; (len = reader.read(chars)) > 0;)
            sb.append(chars, 0, len);

        return sb.toString();
    } catch (IOException e) {
        throw new UncheckedIOException(e);
    }
}

为简化此操作，假设您的字符串不包含换行符，则可以执行

public static byte[] convertToDatabaseColumn(String text) throws IOException {
    try (ByteArrayOutputStream baos = new ByteArrayOutputStream();
         Writer out = new OutputStreamWriter(
                 new GZIPOutputStream(baos), StandardCharsets.UTF_8)) {
        out.write(text);
        out.write("\n");
        out.close();
        return baos.toByteArray();
    } catch (IOException e) {
        throw new UncheckedIOException(e);
    }
}

public static String convertToEntityAttribute(byte[] dbData) throws IOException {
    try (BufferedReader br = new BufferedReader(
            new InputStreamReader(
                    new GZIPInputStream(new ByteArrayInputStream(dbData)),
                    StandardCharsets.UTF_8))) {

        return br.readLine();
    } catch (IOException e) {
        throw new UncheckedIOException(e);
    }
}

public static void main(String[] args) throws IOException {
    byte[] bytes = convertToDatabaseColumn("Hello world, 0123456789 0123456789");
    System.out.println(convertToEntityAttribute(bytes));
}

Answer 2

dbData和buffer代表不同的事物。第一个表示gzip的数据，后者表示未压缩的未压缩数据。输入字节与输出字节之间甚至几乎没有一对一的比率;人们希望压缩数据明显小于输出！

我是否需要ByteArrayInputStream的读缓冲区

2 个答案: