我正在寻找一个允许我添加项目以进行处理的类,以及当项目计数等于批量大小执行某些操作时。我会用这样的东西:
Batcher<Token> batcher = new Batcher<Token>(500, Executors.newFixedThreadPool(4)) {
public void onFlush(List<Token> tokens) {
rest.notifyBatch(tokens);
}
};
tokens.forEach((t)->batcher.add(t));
batcher.awaitDone();
#awaitDone之后我知道所有令牌都已通知。 #onFlush可能会执行任何操作,例如,我可能希望将插入批处理到数据库中。我希望将#onFlush调用放入执行程序。
我想出了一个解决方案,但它看起来像很多代码,所以我的问题是,我有更好的方法吗?是否存在除我实施的类之外的现有类或更好的实现方法?好像我的解决方案有很多动人的作品。
这是我提出的代码:
/**
* Simple class to allow the batched processing of items and then to alternatively wait
* for all batches to be completed.
*/
public abstract class Batcher<T> {
private final int batchSize;
private final ArrayBlockingQueue<T> batch;
private final Executor executor;
private final Phaser phaser = new Phaser(1);
private final AtomicInteger processed = new AtomicInteger(0);
public Batcher(int batchSize, Executor executor) {
this.batchSize = batchSize;
this.executor = executor;
this.batch = new ArrayBlockingQueue<>(batchSize);
}
public void add(T item) {
processed.incrementAndGet();
while (!batch.offer(item)) {
flush();
}
}
public void addAll(Iterable<T> items) {
for (T item : items) {
add(item);
}
}
public int getProcessedCount() {
return processed.get();
}
public void flush() {
if (batch.isEmpty())
return;
final List<T> batched = new ArrayList<>(batchSize);
batch.drainTo(batched, batchSize);
if (!batched.isEmpty())
executor.execute(new PhasedRunnable(batched));
}
public abstract void onFlush(List<T> batch);
public void awaitDone() {
flush();
phaser.arriveAndAwaitAdvance();
}
public void awaitDone(long duration, TimeUnit unit) throws TimeoutException {
flush();
try {
phaser.awaitAdvanceInterruptibly(phaser.arrive(), duration, unit);
}
catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
private class PhasedRunnable implements Runnable {
private final List<T> batch;
private PhasedRunnable(List<T> batch) {
this.batch = batch;
phaser.register();
}
@Override
public void run() {
try {
onFlush(batch);
}
finally {
phaser.arrive();
}
}
}
}
Java 8解决方案会很棒。感谢。
答案 0 :(得分:1)
令我印象深刻的是,您的代码不适用于将多个项添加到单个Batcher
实例的多个线程。如果我们将此限制转换为指定的用例,则无需在内部使用专用的并发类。因此,当容量耗尽时,我们可以累积到普通的ArrayList
并将此列表与新列表交换,而无需复制项目。这样可以简化代码
public class Batcher<T> implements Consumer<T> {
private final int batchSize;
private final Executor executor;
private final Consumer<List<T>> actualAction;
private final Phaser phaser = new Phaser(1);
private ArrayList<T> batch;
private int processed;
public Batcher(int batchSize, Executor executor, Consumer<List<T>> c) {
this.batchSize = batchSize;
this.executor = executor;
this.actualAction = c;
this.batch = new ArrayList<>(batchSize);
}
public void accept(T item) {
processed++;
if(batch.size()==batchSize) flush();
batch.add(item);
}
public int getProcessedCount() {
return processed;
}
public void flush() {
List<T> current = batch;
if (batch.isEmpty())
return;
batch = new ArrayList<>(batchSize);
phaser.register();
executor.execute(() -> {
try {
actualAction.accept(current);
}
finally {
phaser.arrive();
}
});
}
public void awaitDone() {
flush();
phaser.arriveAndAwaitAdvance();
}
public void awaitDone(long duration, TimeUnit unit) throws TimeoutException {
flush();
try {
phaser.awaitAdvanceInterruptibly(phaser.arrive(), duration, unit);
}
catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
}
关于Java 8的具体改进,它使用Consumer
,它允许通过lambda表达式指定最终操作,而不需要子类Batcher
。此外,PhasedRunnable
被lambda表达式替换。作为另一种简化,Batcher<T> implements Consumer<T>
省略了对addAll
方法的需求,因为每Iterable
支持forEach(Consumer<? super T>)
。
所以用例现在看起来像:
Batcher<Token> batcher = new Batcher<>(
500, Executors.newFixedThreadPool(4), currTokens -> rest.notifyBatch(currTokens));
tokens.forEach(batcher);
batcher.awaitDone();