我想在每行包含URL的特定单元格中逐行阅读excel表格。我需要通过以编程方式访问网站来处理这些URL。由于在单线程模型中连续访问每个单元格将非常慢,我计划这样的事情:
Step-1: Read excel sheet's cell of nth row.
Step-2: nThreads++
Step-3: if nThreads==MAX_NO_OF_THREADS, sleep till one of the threads is finished.
else Instantiate a thread to process the URL of that cell.
Step-4: Goto 1.
要实现这一点,我需要做以下事情:
1 - 一些意味着创建一个线程池。我可以使用一组线程对象来创建。但是我更喜欢更好的选择。
2 - 一个Manager线程,它执行从池中获取线程的任务,处理它们的工作并休眠,直到一个线程可用于完成任务。
那么我有哪些选择?
答案 0 :(得分:0)
更容易将此视为限制并发任务的数量。这意味着使用需要输入输入的runnables并且需要知道何时停止运行。此外,您需要知道所有任务何时完成,以便了解所有工作何时完成。
我可以想出这个问题的最简单的解决方案,如下所示。
import java.net.URL;
import java.util.Iterator;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.LinkedBlockingQueue;
public class Q21512025 {
public static void main(String[] args) {
ExecutorService executor = Executors.newCachedThreadPool();
try {
new Q21512025(executor, 5).readCells();
} catch (Exception e) {
e.printStackTrace();
}
executor.shutdownNow();
}
private int maxTasks;
private ExecutorService executor;
private CountDownLatch finished;
private LinkedBlockingQueue<ExcellUrlCell> q;
public Q21512025(ExecutorService executor, int maxTasks) {
this.executor = executor;
this.maxTasks = maxTasks;
finished = new CountDownLatch(maxTasks);
q = new LinkedBlockingQueue<ExcellUrlCell>();
}
public void readCells() throws Exception {
for (int i = 0; i < maxTasks; i++) {
executor.execute(new ExcellUrlParser(q, finished));
}
ExcellReader reader = new ExcellReader(getExampleUrls(10));
while (reader.hasNext()) {
q.add(reader.next());
}
for (int i = 0; i < maxTasks; i++) {
q.add(new ExcellUrlCell(null));
}
System.out.println("Awaiting excell url cell tasks.");
finished.await();
System.out.println("Done.");
}
private URL[] getExampleUrls(int amount) throws Exception {
URL[] urls = new URL[amount];
for (int i = 0; i < amount; i++) {
urls[i] = new URL("http://localhost:" + (i + 2000) + "/");
}
return urls;
}
static class ExcellUrlParser implements Runnable {
private CountDownLatch finished;
private LinkedBlockingQueue<ExcellUrlCell> q;
public ExcellUrlParser(LinkedBlockingQueue<ExcellUrlCell> q, CountDownLatch finished) {
this.finished = finished;
this.q = q;
}
@Override
public void run() {
try {
while (true) {
ExcellUrlCell urlCell = q.take();
if (urlCell.isFinished()) {
break;
}
processUrl(urlCell.getUrl());
}
} catch (Exception e) {
e.printStackTrace();
} finally {
finished.countDown();
}
}
private void processUrl(URL url) {
try { Thread.sleep(1); } catch (Exception ignored) {}
System.out.println(url);
}
}
static class ExcellReader implements Iterator<ExcellUrlCell> {
private URL[] urls;
private int index;
public ExcellReader(URL[] urls) {
this.urls = urls;
}
@Override
public boolean hasNext() {
return (index < urls.length);
}
@Override
public ExcellUrlCell next() {
ExcellUrlCell urlCell = new ExcellUrlCell(urls[index]);
index++;
return urlCell;
}
@Override
public void remove() {
throw new UnsupportedOperationException();
}
}
static class ExcellUrlCell {
private URL url;
public ExcellUrlCell(URL url) {
this.url = url;
}
public URL getUrl() {
return url;
}
public boolean isFinished() {
return (url == null);
}
}
}
答案 1 :(得分:0)
线程管理器怎么样?我恰巧在SourceForge.net上管理多个Fork / Join服务器您可以将请求分解为单独的组件并在单独的线程池see here for an introduction上运行每个组件,或者您可以将请求动态分解为相同的任务以便在线程池,see here
这些开源产品已经存在多年,可以为您节省很多精力。