我有一个标题和数据,我需要在一个Byte Array
中表示。我有一种特殊的格式,用于以Byte Array
打包标头,也可以使用不同的格式将数据打包到Byte Array
中。在我拥有这两个之后,我需要从中获得一个Byte Array
。
以下是C++
中定义的布局,因此我必须在Java
中进行定义。
// below is my header offsets layout
// addressedCenter must be the first byte
static constexpr uint32_t addressedCenter = 0;
static constexpr uint32_t version = addressedCenter + 1;
static constexpr uint32_t numberOfRecords = version + 1;
static constexpr uint32_t bufferUsed = numberOfRecords + sizeof(uint32_t);
static constexpr uint32_t location = bufferUsed + sizeof(uint32_t);
static constexpr uint32_t locationFrom = location + sizeof(CustomerAddress);
static constexpr uint32_t locationOrigin = locationFrom + sizeof(CustomerAddress);
static constexpr uint32_t partition = locationOrigin + sizeof(CustomerAddress);
static constexpr uint32_t copy = partition + 1;
// this is the full size of the header
static constexpr uint32_t headerOffset = copy + 1;
并且CustomerAddress
是uint64_t
的typedef,它是这样构成的 -
typedef uint64_t CustomerAddress;
void client_data(uint8_t datacenter,
uint16_t clientId,
uint8_t dataId,
uint32_t dataCounter,
CustomerAddress& customerAddress)
{
customerAddress = (uint64_t(datacenter) << 56)
+ (uint64_t(clientId) << 40)
+ (uint64_t(dataId) << 32)
+ dataCounter;
}
以下是我的数据布局 -
// below is my data layout -
//
// key type - 1 byte
// key len - 1 byte
// key (variable size = key_len)
// timestamp (sizeof uint64_t)
// data size (sizeof uint16_t)
// data (variable size = data size)
问题陈述: -
现在对于项目的一部分,我试图在Java中的一个特定类中表示整体内容,以便我可以传递必要的字段,它可以使我成为最终的Byte Array
,它将具有首先是标题,然后是数据:
以下是我的DataFrame
课程:
public final class DataFrame {
private final byte addressedCenter;
private final byte version;
private final Map<byte[], byte[]> keyDataHolder;
private final long location;
private final long locationFrom;
private final long locationOrigin;
private final byte partition;
private final byte copy;
public DataFrame(byte addressedCenter, byte version,
Map<byte[], byte[]> keyDataHolder, long location, long locationFrom,
long locationOrigin, byte partition, byte copy) {
this.addressedCenter = addressedCenter;
this.version = version;
this.keyDataHolder = keyDataHolder;
this.location = location;
this.locationFrom = locationFrom;
this.locationOrigin = locationOrigin;
this.partition = partition;
this.copy = copy;
}
public byte[] serialize() {
// All of the data is embedded in a binary array with fixed maximum size 70000
ByteBuffer byteBuffer = ByteBuffer.allocate(70000);
byteBuffer.order(ByteOrder.BIG_ENDIAN);
int numOfRecords = keyDataHolder.size();
int bufferUsed = getBufferUsed(keyDataHolder); // 36 + dataSize + 1 + 1 + keyLength + 8 + 2;
// header layout
byteBuffer.put(addressedCenter); // byte
byteBuffer.put(version); // byte
byteBuffer.putInt(numOfRecords); // int
byteBuffer.putInt(bufferUsed); // int
byteBuffer.putLong(location); // long
byteBuffer.putLong(locationFrom); // long
byteBuffer.putLong(locationOrigin); // long
byteBuffer.put(partition); // byte
byteBuffer.put(copy); // byte
// now the data layout
for (Map.Entry<byte[], byte[]> entry : keyDataHolder.entrySet()) {
byte keyType = 0;
byte keyLength = (byte) entry.getKey().length;
byte[] key = entry.getKey();
byte[] data = entry.getValue();
short dataSize = (short) data.length;
ByteBuffer dataBuffer = ByteBuffer.wrap(data);
long timestamp = 0;
if (dataSize > 10) {
timestamp = dataBuffer.getLong(2);
}
byteBuffer.put(keyType);
byteBuffer.put(keyLength);
byteBuffer.put(key);
byteBuffer.putLong(timestamp);
byteBuffer.putShort(dataSize);
byteBuffer.put(data);
}
return byteBuffer.array();
}
private int getBufferUsed(final Map<byte[], byte[]> keyDataHolder) {
int size = 36;
for (Map.Entry<byte[], byte[]> entry : keyDataHolder.entrySet()) {
size += 1 + 1 + 8 + 2;
size += entry.getKey().length;
size += entry.getValue().length;
}
return size;
}
}
以下是我使用上述DataFrame
课程的方式:
public static void main(String[] args) throws IOException {
// header layout
byte addressedCenter = 0;
byte version = 1;
long location = packCustomerAddress((byte) 12, (short) 13, (byte) 32, (int) 120);
long locationFrom = packCustomerAddress((byte) 21, (short) 23, (byte) 41, (int) 130);
long locationOrigin = packCustomerAddress((byte) 21, (short) 24, (byte) 41, (int) 140);
byte partition = 3;
byte copy = 0;
// this map will have key as the actual key and value as the actual data, both in byte array
// for now I am storing only two entries in this map
Map<byte[], byte[]> keyDataHolder = new HashMap<byte[], byte[]>();
for (int i = 1; i <= 2; i++) {
keyDataHolder.put(generateKey(), getMyData());
}
DataFrame records =
new DataFrame(addressedCenter, version, keyDataHolder, location, locationFrom,
locationOrigin, partition, copy);
// this will give me final packed byte array
// which will have header and data in it.
byte[] packedArray = records.serialize();
}
private static long packCustomerAddress(byte datacenter, short clientId, byte dataId,
int dataCounter) {
return ((long) (datacenter) << 56) | ((long) clientId << 40) | ((long) dataId << 32)
| ((long) dataCounter);
}
正如您在我的DataFrame
课程中所看到的,我正在为ByteBuffer
分配预定义的70000
大小。有没有更好的方法可以在制作ByteBuffer
而不是使用硬编码的70000
时分配我正在使用的尺寸?
与我所做的相比,还有更好的方法将我的标头和数据打包在一个字节数组中吗?我还需要确保它是线程安全的,因为它可以由多个线程调用。
答案 0 :(得分:2)
有没有更好的方法可以在制作
ByteBuffer
而不是使用硬编码的70000
时分配我正在使用的尺寸?
至少有两种不重叠的方法。你可以使用两者。
一个是缓冲池。您应该找出在高峰时段需要多少缓冲区,并使用高于它的最大值,例如: max + max / 2,max + average,max + mode,2 * max。
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.util.concurrent.CompletionStage;
import java.util.concurrent.LinkedBlockingDeque;
import java.util.function.Consumer;
import java.util.function.Function;
public class ByteBufferPool {
private final int bufferCapacity;
private final LinkedBlockingDeque<ByteBuffer> queue;
public ByteBufferPool(int limit, int bufferCapacity) {
if (limit < 0) throw new IllegalArgumentException("limit must not be negative.");
if (bufferCapacity < 0) throw new IllegalArgumentException("bufferCapacity must not be negative.");
this.bufferCapacity = bufferCapacity;
this.queue = (limit == 0) ? null : new LinkedBlockingDeque<>(limit);
}
public ByteBuffer acquire() {
ByteBuffer buffer = (queue == null) ? null : queue.pollFirst();
if (buffer == null) {
buffer = ByteBuffer.allocate(bufferCapacity);
}
else {
buffer.clear();
buffer.order(ByteOrder.BIG_ENDIAN);
}
return buffer;
}
public boolean release(ByteBuffer buffer) {
if (buffer == null) throw new IllegalArgumentException("buffer must not be null.");
if (buffer.capacity() != bufferCapacity) throw new IllegalArgumentException("buffer has unsupported capacity.");
if (buffer.isDirect()) throw new IllegalArgumentException("buffer must not be direct.");
if (buffer.isReadOnly()) throw new IllegalArgumentException("buffer must not be read-only.");
return (queue == null) ? false : queue.offerFirst(buffer);
}
public void withBuffer(Consumer<ByteBuffer> action) {
if (action == null) throw new IllegalArgumentException("action must not be null.");
ByteBuffer buffer = acquire();
try {
action.accept(buffer);
}
finally {
release(buffer);
}
}
public <T> T withBuffer(Function<ByteBuffer, T> function) {
if (function == null) throw new IllegalArgumentException("function must not be null.");
ByteBuffer buffer = acquire();
try {
return function.apply(buffer);
}
finally {
release(buffer);
}
}
public <T> CompletionStage<T> withBufferAsync(Function<ByteBuffer, CompletionStage<T>> asyncFunction) {
if (asyncFunction == null) throw new IllegalArgumentException("asyncFunction must not be null.");
ByteBuffer buffer = acquire();
CompletionStage<T> future = null;
try {
future = asyncFunction.apply(buffer);
}
finally {
if (future == null) {
release(buffer);
}
else {
future = future.whenComplete((result, throwable) -> release(buffer));
}
}
return future;
}
}
withBuffer
方法允许直接使用池,而acquire
和release
允许分离获取点和释放点。
另一个是隔离序列化接口,例如, put
,putInt
和putLong
,然后您可以在其中实现字节计数类和实际的字节缓冲类。你应该为这样的接口添加一个方法,以便知道序列化器是在计算字节数还是缓冲,以避免不必要的字节生成,以及另一种直接增加字节使用的方法,在某些编码中计算字符串大小时非常有用,而不需要实际序列化
public interface ByteSerializer {
ByteSerializer put(byte value);
ByteSerializer putInt(int value);
ByteSerializer putLong(long value);
boolean isSerializing();
ByteSerializer add(int bytes);
int position();
}
public class ByteCountSerializer implements ByteSerializer {
private int count = 0;
@Override
public ByteSerializer put(byte value) {
count += 1;
return this;
}
@Override
public ByteSerializer putInt(int value) {
count += 4;
return this;
}
@Override
public ByteSerializer putLong(long value) {
count += 8;
return this;
}
@Override
public boolean isSerializing() {
return false;
}
@Override
public ByteSerializer add(int bytes) {
if (bytes < 0) throw new IllegalArgumentException("bytes must not be negative.");
count += bytes;
return this;
}
@Override
public int position() {
return count;
}
}
import java.nio.ByteBuffer;
public class ByteBufferSerializer implements ByteSerializer {
private final ByteBuffer buffer;
public ByteBufferSerializer(int bufferCapacity) {
if (bufferCapacity < 0) throw new IllegalArgumentException("bufferCapacity must not be negative.");
this.buffer = ByteBuffer.allocate(bufferCapacity);
}
@Override
public ByteSerializer put(byte value) {
buffer.put(value);
return this;
}
@Override
public ByteSerializer putInt(int value) {
buffer.putInt(value);
return this;
}
@Override
public ByteSerializer putLong(long value) {
buffer.putLong(value);
return this;
}
@Override
public boolean isSerializing() {
return true;
}
@Override
public ByteSerializer add(int bytes) {
if (bytes < 0) throw new IllegalArgumentException("bytes must not be negative.");
for (int b = 0; b < bytes; b++) {
buffer.put((byte)0);
}
return this;
// or throw new UnsupportedOperationException();
}
@Override
public int position() {
return buffer.position();
}
public ByteBuffer buffer() {
return buffer;
}
}
在你的代码中,你会按照这些方式做一些事情(未经测试):
ByteCountSerializer counter = new ByteCountSerializer();
dataFrame.serialize(counter);
ByteBufferSerializer serializer = new ByteByfferSerializer(counter.position());
dataFrame.serialize(serializer);
ByteBuffer buffer = serializer.buffer();
// ... write buffer, ?, profit ...
您的DataFrame.serialize
方法应该重构为接受ByteSerializer
,如果它会生成数据,则应检查isSerializing
以确定它是否应仅计算大小或实际写字节。
我将这两种方法结合起来作为练习,主要是因为它很大程度上取决于你决定如何做。
例如,您可以让ByteBufferSerializer
直接使用池并保留任意容量(例如您的70000),您可以按容量汇集ByteBuffer
s(但不是所需的容量,请使用最小功率2大于所需容量,并在从acquire
返回之前设置缓冲区限制,或者只要添加ByteBufferSerializer
方法,就可以直接汇集reset()
。< / p>
与我所做的相比,还有更好的方法将我的标题和数据打包在一个字节数组中吗?
是。传递字节缓冲实例而不是让某些方法返回字节数组,这些数组在检查其长度或复制其内容后被丢弃。
我还需要确保它是线程安全的,因为它可以由多个线程调用。
只要每个缓冲区仅由一个线程使用,并且正确同步,您就不必担心。
正确的同步意味着您的池管理器已在其方法中获取和释放语义,并且如果多个线程在从其获取并将其返回到池之间使用缓冲区,则您将在停止使用的线程中添加释放语义缓冲区并在开始使用缓冲区的线程中添加获取语义。例如,如果您通过CompletableFuture
传递缓冲区,则不必担心这一点,或者如果您在具有Exchanger
的线程或{的正确实现之间明确通信{1}}。
来自BlockingQueue
的套餐说明:
java.util.concurrent
及其子包中所有类的方法将这些保证扩展到更高级别的同步。特别是:
在将对象放入任何并发集合之前的线程中的操作在从另一个线程的集合中访问或删除该元素之后的之前发生的操作。
在
java.util.concurrent
提交Runnable
之前,线程中的操作发生在执行开始之前。同样,Executor
提交给Callables
。由
ExecutorService
表示的异步计算所采取的操作是在通过另一个线程中的Future
检索结果之后的之前发生的。“释放”同步器方法之前的操作,例如
Future.get()
,Lock.unlock
和Semaphore.release
发生在成功之后的操作之前在另一个帖子中的同一个同步器对象上获取“CountDownLatch.countDown
,Lock.lock
,Semaphore.acquire
和Condition.await
等方法。对于通过
CountDownLatch.await
成功交换对象的每对线程,每个线程中Exchanger
之前的操作发生在对应的后续行为中exchange()
在另一个帖子中。调用
exchange()
和CyclicBarrier.await
之前的操作(及其变体)发生在屏障操作执行的操作之前,以及由此执行的操作阻止操作在从其他线程中的相应Phaser.awaitAdvance
成功返回之后的操作之前发生。
答案 1 :(得分:0)
另一种方法是通过DataOutputStream
围绕ByteArrayOutputStream
,但你应该将你的表现调整集中在所需的地方,而这不是其中之一。效率在这里不是任何问题。网络I / O将占据数量级。
使用ByteArrayOutputStream
的另一个原因是您不必提前猜测缓冲区大小:它会根据需要增长。
为了保持线程安全,请仅使用局部变量。