使用ByteBuffer以有效的方式在一个字节数组中打包标头和数据布局?

时间:2017-01-16 23:37:47

标签: java arrays multithreading thread-safety bytebuffer

我有一个标题和数据,我需要在一个Byte Array中表示。我有一种特殊的格式,用于以Byte Array打包标头,也可以使用不同的格式将数据打包到Byte Array中。在我拥有这两个之后,我需要从中获得一个Byte Array

以下是C++中定义的布局,因此我必须在Java中进行定义。

// below is my header offsets layout

// addressedCenter must be the first byte
static constexpr uint32_t  addressedCenter      = 0;
static constexpr uint32_t  version              = addressedCenter + 1;
static constexpr uint32_t  numberOfRecords      = version + 1;
static constexpr uint32_t  bufferUsed           = numberOfRecords + sizeof(uint32_t);
static constexpr uint32_t  location             = bufferUsed + sizeof(uint32_t);
static constexpr uint32_t  locationFrom         = location + sizeof(CustomerAddress);
static constexpr uint32_t  locationOrigin       = locationFrom + sizeof(CustomerAddress);
static constexpr uint32_t  partition            = locationOrigin + sizeof(CustomerAddress);
static constexpr uint32_t  copy                 = partition + 1;

// this is the full size of the header
static constexpr uint32_t headerOffset = copy + 1;

并且CustomerAddressuint64_t的typedef,它是这样构成的 -

typedef uint64_t   CustomerAddress;

void client_data(uint8_t datacenter, 
                 uint16_t clientId, 
                 uint8_t dataId, 
                 uint32_t dataCounter,
                 CustomerAddress& customerAddress)
{
    customerAddress = (uint64_t(datacenter) << 56)
                    + (uint64_t(clientId) << 40)
                    + (uint64_t(dataId) << 32)
                    + dataCounter;
}

以下是我的数据布局 -

// below is my data layout -
//
// key type - 1 byte
// key len - 1 byte
// key (variable size = key_len)
// timestamp (sizeof uint64_t)
// data size (sizeof uint16_t)
// data (variable size = data size)

问题陈述: -

现在对于项目的一部分,我试图在Java中的一个特定类中表示整体内容,以便我可以传递必要的字段,它可以使我成为最终的Byte Array,它将具有首先是标题,然后是数据:

以下是我的DataFrame课程:

public final class DataFrame {
  private final byte addressedCenter;
  private final byte version;
  private final Map<byte[], byte[]> keyDataHolder;
  private final long location;
  private final long locationFrom;
  private final long locationOrigin;
  private final byte partition;
  private final byte copy;

  public DataFrame(byte addressedCenter, byte version,
      Map<byte[], byte[]> keyDataHolder, long location, long locationFrom,
      long locationOrigin, byte partition, byte copy) {
    this.addressedCenter = addressedCenter;
    this.version = version;
    this.keyDataHolder = keyDataHolder;
    this.location = location;
    this.locationFrom = locationFrom;
    this.locationOrigin = locationOrigin;
    this.partition = partition;
    this.copy = copy;
  }

  public byte[] serialize() {
    // All of the data is embedded in a binary array with fixed maximum size 70000
    ByteBuffer byteBuffer = ByteBuffer.allocate(70000);
    byteBuffer.order(ByteOrder.BIG_ENDIAN);

    int numOfRecords = keyDataHolder.size();
    int bufferUsed = getBufferUsed(keyDataHolder); // 36 + dataSize + 1 + 1 + keyLength + 8 + 2;

    // header layout
    byteBuffer.put(addressedCenter); // byte
    byteBuffer.put(version); // byte
    byteBuffer.putInt(numOfRecords); // int
    byteBuffer.putInt(bufferUsed); // int
    byteBuffer.putLong(location); // long
    byteBuffer.putLong(locationFrom); // long
    byteBuffer.putLong(locationOrigin); // long
    byteBuffer.put(partition); // byte
    byteBuffer.put(copy); // byte

    // now the data layout
    for (Map.Entry<byte[], byte[]> entry : keyDataHolder.entrySet()) {
      byte keyType = 0;
      byte keyLength = (byte) entry.getKey().length;
      byte[] key = entry.getKey();
      byte[] data = entry.getValue();
      short dataSize = (short) data.length;

      ByteBuffer dataBuffer = ByteBuffer.wrap(data);
      long timestamp = 0;

      if (dataSize > 10) {
        timestamp = dataBuffer.getLong(2);              
      }       

      byteBuffer.put(keyType);
      byteBuffer.put(keyLength);
      byteBuffer.put(key);
      byteBuffer.putLong(timestamp);
      byteBuffer.putShort(dataSize);
      byteBuffer.put(data);
    }
    return byteBuffer.array();
  }

  private int getBufferUsed(final Map<byte[], byte[]> keyDataHolder) {
    int size = 36;
    for (Map.Entry<byte[], byte[]> entry : keyDataHolder.entrySet()) {
      size += 1 + 1 + 8 + 2;
      size += entry.getKey().length;
      size += entry.getValue().length;
    }
    return size;
  }  
}

以下是我使用上述DataFrame课程的方式:

  public static void main(String[] args) throws IOException {
    // header layout
    byte addressedCenter = 0;
    byte version = 1;

    long location = packCustomerAddress((byte) 12, (short) 13, (byte) 32, (int) 120);
    long locationFrom = packCustomerAddress((byte) 21, (short) 23, (byte) 41, (int) 130);
    long locationOrigin = packCustomerAddress((byte) 21, (short) 24, (byte) 41, (int) 140);

    byte partition = 3;
    byte copy = 0;

    // this map will have key as the actual key and value as the actual data, both in byte array
    // for now I am storing only two entries in this map
    Map<byte[], byte[]> keyDataHolder = new HashMap<byte[], byte[]>();
    for (int i = 1; i <= 2; i++) {
      keyDataHolder.put(generateKey(), getMyData());
    }

    DataFrame records =
        new DataFrame(addressedCenter, version, keyDataHolder, location, locationFrom,
            locationOrigin, partition, copy);

    // this will give me final packed byte array
    // which will have header and data in it.
    byte[] packedArray = records.serialize();
  }

  private static long packCustomerAddress(byte datacenter, short clientId, byte dataId,
      int dataCounter) {
    return ((long) (datacenter) << 56) | ((long) clientId << 40) | ((long) dataId << 32)
        | ((long) dataCounter);
  }   

正如您在我的DataFrame课程中所看到的,我正在为ByteBuffer分配预定义的70000大小。有没有更好的方法可以在制作ByteBuffer而不是使用硬编码的70000时分配我正在使用的尺寸?

与我所做的相比,还有更好的方法将我的标头和数据打包在一个字节数组中吗?我还需要确保它是线程安全的,因为它可以由多个线程调用。

2 个答案:

答案 0 :(得分:2)

  

有没有更好的方法可以在制作ByteBuffer而不是使用硬编码的70000时分配我正在使用的尺寸?

至少有两种不重叠的方法。你可以使用两者。

一个是缓冲池。您应该找出在高峰时段需要多少缓冲区,并使用高于它的最大值,例如: max + max / 2,max + average,max + mode,2 * max。

import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.util.concurrent.CompletionStage;
import java.util.concurrent.LinkedBlockingDeque;
import java.util.function.Consumer;
import java.util.function.Function;

public class ByteBufferPool {
    private final int bufferCapacity;
    private final LinkedBlockingDeque<ByteBuffer> queue;

    public ByteBufferPool(int limit, int bufferCapacity) {
        if (limit < 0) throw new IllegalArgumentException("limit must not be negative.");
        if (bufferCapacity < 0) throw new IllegalArgumentException("bufferCapacity must not be negative.");

        this.bufferCapacity = bufferCapacity;
        this.queue = (limit == 0) ? null : new LinkedBlockingDeque<>(limit);
    }

    public ByteBuffer acquire() {
        ByteBuffer buffer = (queue == null) ? null : queue.pollFirst();
        if (buffer == null) {
            buffer = ByteBuffer.allocate(bufferCapacity);
        }
        else {
            buffer.clear();
            buffer.order(ByteOrder.BIG_ENDIAN);
        }
        return buffer;
    }

    public boolean release(ByteBuffer buffer) {
        if (buffer == null) throw new IllegalArgumentException("buffer must not be null.");
        if (buffer.capacity() != bufferCapacity) throw new IllegalArgumentException("buffer has unsupported capacity.");
        if (buffer.isDirect()) throw new IllegalArgumentException("buffer must not be direct.");
        if (buffer.isReadOnly()) throw new IllegalArgumentException("buffer must not be read-only.");

        return (queue == null) ? false : queue.offerFirst(buffer);
    }

    public void withBuffer(Consumer<ByteBuffer> action) {
        if (action == null) throw new IllegalArgumentException("action must not be null.");

        ByteBuffer buffer = acquire();
        try {
            action.accept(buffer);
        }
        finally {
            release(buffer);
        }
    }

    public <T> T withBuffer(Function<ByteBuffer, T> function) {
        if (function == null) throw new IllegalArgumentException("function must not be null.");

        ByteBuffer buffer = acquire();
        try {
            return function.apply(buffer);
        }
        finally {
            release(buffer);
        }
    }

    public <T> CompletionStage<T> withBufferAsync(Function<ByteBuffer, CompletionStage<T>> asyncFunction) {
        if (asyncFunction == null) throw new IllegalArgumentException("asyncFunction must not be null.");

        ByteBuffer buffer = acquire();
        CompletionStage<T> future = null;
        try {
            future = asyncFunction.apply(buffer);
        }
        finally {
            if (future == null) {
                release(buffer);
            }
            else {
                future = future.whenComplete((result, throwable) -> release(buffer));
            }
        }
        return future;
    }
}

withBuffer方法允许直接使用池,而acquirerelease允许分离获取点和释放点。

另一个是隔离序列化接口,例如, putputIntputLong,然后您可以在其中实现字节计数类和实际的字节缓冲类。你应该为这样的接口添加一个方法,以便知道序列化器是在计算字节数还是缓冲,以避免不必要的字节生成,以及另一种直接增加字节使用的方法,在某些编码中计算字符串大小时非常有用,而不需要实际序列化

public interface ByteSerializer {
    ByteSerializer put(byte value);

    ByteSerializer putInt(int value);

    ByteSerializer putLong(long value);

    boolean isSerializing();

    ByteSerializer add(int bytes);

    int position();
}

public class ByteCountSerializer implements ByteSerializer {
    private int count = 0;

    @Override
    public ByteSerializer put(byte value) {
        count += 1;
        return this;
    }

    @Override
    public ByteSerializer putInt(int value) {
        count += 4;
        return this;
    }

    @Override
    public ByteSerializer putLong(long value) {
        count += 8;
        return this;
    }

    @Override
    public boolean isSerializing() {
        return false;
    }

    @Override
    public ByteSerializer add(int bytes) {
        if (bytes < 0) throw new IllegalArgumentException("bytes must not be negative.");

        count += bytes;
        return this;
    }

    @Override
    public int position() {
        return count;
    }
}

import java.nio.ByteBuffer;

public class ByteBufferSerializer implements ByteSerializer {
    private final ByteBuffer buffer;

    public ByteBufferSerializer(int bufferCapacity) {
        if (bufferCapacity < 0) throw new IllegalArgumentException("bufferCapacity must not be negative.");

        this.buffer = ByteBuffer.allocate(bufferCapacity);
    }

    @Override
    public ByteSerializer put(byte value) {
        buffer.put(value);
        return this;
    }

    @Override
    public ByteSerializer putInt(int value) {
        buffer.putInt(value);
        return this;
    }

    @Override
    public ByteSerializer putLong(long value) {
        buffer.putLong(value);
        return this;
    }

    @Override
    public boolean isSerializing() {
        return true;
    }

    @Override
    public ByteSerializer add(int bytes) {
        if (bytes < 0) throw new IllegalArgumentException("bytes must not be negative.");

        for (int b = 0; b < bytes; b++) {
            buffer.put((byte)0);
        }
        return this;
        // or throw new UnsupportedOperationException();
    }

    @Override
    public int position() {
        return buffer.position();
    }

    public ByteBuffer buffer() {
        return buffer;
    }
}

在你的代码中,你会按照这些方式做一些事情(未经测试):

ByteCountSerializer counter = new ByteCountSerializer();
dataFrame.serialize(counter);
ByteBufferSerializer serializer = new ByteByfferSerializer(counter.position());
dataFrame.serialize(serializer);
ByteBuffer buffer = serializer.buffer();
// ... write buffer, ?, profit ...

您的DataFrame.serialize方法应该重构为接受ByteSerializer,如果它会生成数据,则应检查isSerializing以确定它是否应仅计算大小或实际写字节。

我将这两种方法结合起来作为练习,主要是因为它很大程度上取决于你决定如何做。

例如,您可以让ByteBufferSerializer直接使用池并保留任意容量(例如您的70000),您可以按容量汇集ByteBuffer s(但不是所需的容量,请使用最小功率2大于所需容量,并在从acquire返回之前设置缓冲区限制,或者只要添加ByteBufferSerializer方法,就可以直接汇集reset()。< / p>

  

与我所做的相比,还有更好的方法将我的标题和数据打包在一个字节数组中吗?

是。传递字节缓冲实例而不是让某些方法返回字节数组,这些数组在检查其长度或复制其内容后被丢弃。

  

我还需要确保它是线程安全的,因为它可以由多个线程调用。

只要每个缓冲区仅由一个线程使用,并且正确同步,您就不必担心。

正确的同步意味着您的池管理器已在其方法中获取和释放语义,并且如果多个线程在从其获取并将其返回到池之间使用缓冲区,则您将在停止使用的线程中添加释放语义缓冲区并在开始使用缓冲区的线程中添加获取语义。例如,如果您通过CompletableFuture传递缓冲区,则不必担心这一点,或者如果您在具有Exchanger的线程或{的正确实现之间明确通信{1}}。

来自BlockingQueue的套餐说明:

  

java.util.concurrent及其子包中所有类的方法将这些保证扩展到更高级别的同步。特别是:

     
      
  • 在将对象放入任何并发集合之前的线程中的操作在从另一个线程的集合中访问或删除该元素之后的之前发生的操作。

  •   
  • java.util.concurrent 提交Runnable之前,线程中的操作发生在执行开始之前。同样,Executor提交给Callables

  •   
  • ExecutorService 表示的异步计算所采取的操作是在通过另一个线程中的Future检索结果之后的之前发生的。

  •   
  • “释放”同步器方法之前的操作,例如Future.get()Lock.unlockSemaphore.release 发生在成功之后的操作之前在另一个帖子中的同一个同步器对象上获取“CountDownLatch.countDownLock.lockSemaphore.acquireCondition.await等方法。

  •   
  • 对于通过CountDownLatch.await成功交换对象的每对线程,每个线程Exchanger之前的操作发生在对应的后续行为中exchange()在另一个帖子中。

  •   
  • 调用exchange()CyclicBarrier.await之前的操作(及其变体)发生在屏障操作执行的操作之前,以及由此执行的操作阻止操作在从其他线程中的相应Phaser.awaitAdvance成功返回之后的操作之前发生。

  •   

答案 1 :(得分:0)

另一种方法是通过DataOutputStream围绕ByteArrayOutputStream,但你应该将你的表现调整集中在所需的地方,而这不是其中之一。效率在这里不是任何问题。网络I / O将占据数量级。

使用ByteArrayOutputStream的另一个原因是您不必提前猜测缓冲区大小:它会根据需要增长。

为了保持线程安全,请仅使用局部变量。