我有以下工作流程:
有n
条记录需要通过网络进行检索,随后需要n
进行昂贵的计算。输入代码,如下所示:
List<Integer> ids = {1,2,....n};
ids.forEach(id -> {
Record r = RetrieveRecord(id); // Blocking IO
ProcessRecord(r); // CPU Intensive
})
我想将阻塞部分转换为异步,以便通过单个线程将时间减至最少-本质上是通过确保在处理记录i+1
时检索到记录i
来实现的。这样执行将如下所示:
Retrieve(1).start()
Retrieve(1).onEnd(() -> { start Retrieve(2), Process(1) })
Retrieve(2).onEnd(() -> { start Retrieve(3), Process(2) })
....
现在,我可以想出一个天真的方法来用List<>
和CompletableFuture
来实现这一点,但这需要我以不同的方式处理第一条记录。
是否有更优雅的方法来解决此问题,例如反应流?
一种解决方案可能会让我轻松地配置多少记录Process()
可以落后于Retreive()
?
答案 0 :(得分:1)
因此,您有N个任务,并且希望并行运行它们,但最多只能同时运行K个任务。最自然的方法是最初具有一个任务生成器和一个具有K个权限的权限计数器。任务生成器创建K个任务,并等待更多权限。每个权限归某个任务所有,并在任务结束时返回。 Java中的标准权限计数器是类java.util.concurrent.Semaphore
:
List<Integer> ids = {1,2,....n};
Semaphore sem = new Semaphore(K);
ids.forEach(id -> {
sem.aquire();
CompletableFuture<Data> fut = Retrieve(id);
fut.thenRun(sem::release);
fut.thenAcceptAsync(this::ProcessRecord, someExecutor);
})
由于任务生成器仅占用一个线程,因此使其异步是没有意义的。但是,如果您不想为任务生成器使用专用线程并希望实现异步解决方案,那么主要问题是哪个类可以扮演异步权限计数器的角色。您有3个选择:
org.df4j.core.boundconnector.permitstream.Semafor
答案 1 :(得分:0)
这是我最终想到的,似乎可以完成工作:
Flowable.just(1,2,3,4,5,6) // Completes in 1 + 6 * 3 = 19 secs
.concatMapEager(v->
Flowable.just(v)
.subscribeOn(Schedulers.io())
.map( e->{
System.out.println(getElapsed("Req " + e + " started");
Thread.sleep(1000); // Network: 1 sec
System.out.println(getElapsed("Req " + e + " done");
return e;
}, requestsOnWire, 1) // requestsOnWire = K = 2
.blockingSubscribe(new DisposableSubscriber<Integer>() {
@Override
protected void onStart() {
request(1);
}
@Override
public void onNext(Integer s) {
request(1);
System.out.println("Proc " + s + " started");
try {
Thread.sleep(3000); // Compute: 3 secs
System.out.println("Proc " + s + " done");
} catch (InterruptedException e) {
e.printStackTrace();
}
}
@Override
public void onError(Throwable t) {
}
@Override
public void onComplete() {
}
});
下面是执行顺序。请注意,在任何给定的时间点,有1条记录正在处理,最多2条在线请求和2条内存中最多未处理的记录(进程落后K = 2)
0 secs: Req 1 started
: Req 2 started
1 secs: Req 2 done
: Req 1 done
: Proc 1 started
: Req 3 started
: Req 4 started
2 secs: Req 3 done
: Req 4 done
4 secs: Proc 1 done
: Proc 2 started
: Req 5 started
5 secs: Req 5 done
7 secs: Proc 2 done
: Proc 3 started
: Req 6 started
8 secs: Req 6 done
10 secs: Proc 3 done
: Proc 4 started
13 secs: Proc 4 done
: Proc 5 started
16 secs: Proc 5 done
: Proc 6 started
19 secs: Proc 6 done
希望这里没有反模式/陷阱。
答案 2 :(得分:0)
使用df4j和显式异步信号量的解决方案:
import org.df4j.core.boundconnector.permitstream.Semafor;
import org.df4j.core.tasknode.Action;
import org.df4j.core.tasknode.messagestream.Actor;
import java.util.Arrays;
import java.util.Iterator;
import java.util.List;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ForkJoinPool;
public class AsyncSemaDemo extends Actor {
List<Integer> ids = Arrays.asList(1, 2, 3, 4, 5);
Semafor sema = new Semafor(this, 2);
Iterator<Integer> iter = ids.iterator();
int tick = 100; // millis
CountDownLatch done = new CountDownLatch(ids.size());
long start = System.currentTimeMillis();
private void printClock(String s) {
long ticks = (System.currentTimeMillis() - start)/tick;
System.out.println(Long.toString(ticks) + " " + s);
}
CompletableFuture<Integer> Retrieve(Integer e) {
return CompletableFuture.supplyAsync(() -> {
printClock("Req " + e + " started");
try {
Thread.sleep(tick); // Network
} catch (InterruptedException ex) {
}
printClock(" Req " + e + " done");
return e;
}, executor);
}
void ProcessRecord(Integer s) {
printClock(" Proc " + s + " started");
try {
Thread.sleep(tick*2); // Compute
} catch (InterruptedException ex) {
}
printClock(" Proc " + s + " done");
}
@Action
public void act() {
if (iter.hasNext()) {
CompletableFuture<Integer> fut = Retrieve(iter.next());
fut.thenRun(sema::release);
fut.thenAcceptAsync(this::ProcessRecord, executor)
.thenRun(done::countDown);
} else {
super.stop();
}
}
public static void main(String[] args) throws InterruptedException {
AsyncSemaDemo asyncSemaDemo = new AsyncSemaDemo();
asyncSemaDemo.start(ForkJoinPool.commonPool());
asyncSemaDemo.done.await();
}
}
其日志应为:
0 Req 1 started
0 Req 2 started
1 Req 1 done
1 Proc 1 started
1 Req 3 started
1 Req 2 done
1 Proc 2 started
1 Req 4 started
2 Req 3 done
2 Proc 3 started
2 Req 5 started
2 Req 4 done
2 Proc 4 started
3 Proc 1 done
3 Req 5 done
3 Proc 5 started
3 Proc 2 done
4 Proc 3 done
4 Proc 4 done
5 Proc 5 done
请注意,该解决方案与我以前对标准java.util.concurrent.Semaphore.
的回答如何接近