public static void getTestData() {
try {
filename = "InventoryData_" + form_id;
PrintWriter writer = new PrintWriter("/Users/pnroy/Documents/" +filename + ".txt");
pids = new ArrayList<ProductId>();
GetData productList = new GetData();
System.out.println("Getting productId");
pids = productList.GetProductIds(form_id);
int perThreadSize = pids.size() / numberOfCrawlers;
ArrayList<ArrayList<ProductId>> perThreadData = new
ArrayList<ArrayList<ProductId>>(numberOfCrawlers);
for (int i = 1; i <= numberOfCrawlers; i++) {
perThreadData.add(new ArrayList<ProductId>(perThreadSize));
for (int j = 0; j < perThreadSize; j++) {
ProductId ids = new ProductId();
ids.setEbProductID((pids.get(((i - 1) * perThreadSize + j))).getEbProductID());
ids.setECProductID((pids.get(((i - 1) * perThreadSize + j))).getECProductID());
perThreadData.get(i - 1).add(ids);
}
}
BlockingQueue<String> q = new LinkedBlockingQueue<String>();
Consumer c1 = new Consumer(q);
Thread[] thread = new Thread[numberOfCrawlers];
for (int k = 0; k <= numberOfCrawlers; k++) {
// System.out.println(k);
GetCombinedData data = new GetCombinedData();
thread[k] = new Thread(data);
thread[k].setDaemon(true);
data.setVal(perThreadData.get(k), filename, q);
thread[k].start();
// writer.println(data.getResult());
}
new Thread(c1).start();
for (int l = 0; l <= numberOfCrawlers; l++) {
thread[l].join();
}
} catch (Exception e) {
}
}
此处抓取器的数量是线程数。
GetCombined类的run方法具有以下代码: pids作为perThreadData.get(k-1)从main方法传递 CassController类查询API,并在经过一些处理后获得字符串结果。
public void run(){
try{
for(int i=0;i<pids.size();i++){
//System.out.println("before cassini");
CassController cass = new CassController();
String result=cass.getPaginationDetails(pids.get(i));
queue.put(result);
// System.out.println(result);
Thread.sleep(1000);
}
writer.close();
}catch(Exception ex){
}
Consumer.java具有以下代码:
public class Consumer implements Runnable{
private final BlockingQueue queue;
Consumer(BlockingQueue q) { queue = q; }
public void run(){
try {
while (queue.size()>0)
{
consume(queue.take());
}
} catch (InterruptedException ex)
{
}
}
void consume(Object x) {
try{
PrintWriter writer = new PrintWriter(new FileWriter("/Users/pnroy/Documents/Inventory", true));
writer.println(x.toString());
writer.close();
}catch(IOException ex){
}
}
因此,如果我将爬虫数设置为10,如果有500条记录,每个线程将处理50条记录。我需要将结果写入文件。我很困惑,因为它的线程数组我可以实现这一点每个线程都在做一堆操作。
我尝试使用阻塞队列,但这是打印重复的结果。我是多线程的新手,不知道如何处理这个案例。 你能建议吗。
答案 0 :(得分:-1)
随着许多有用的高级并发类的引入,它现在建议不再直接使用Thread
类。即使BlockingQueue
类也是相当低级的。
相反,你有一个很好的CompletionService
应用程序,它建立在ExecutorService
之上。以下示例显示了如何使用它。
您想要替换PartialResultTask
(主要处理发生的位置)和System.out.println
(您可能希望将结果写入文件的位置)中的代码)。
public class ParallelProcessing {
public static void main(String[] args) {
ExecutorService executionService = Executors.newFixedThreadPool(10);
CompletionService<String> completionService = new ExecutorCompletionService<>(executionService);
// submit tasks
for (int i = 0; i < 500; i++) {
completionService.submit(new PartialResultTask(i));
}
// collect result
for (int i = 0; i < 500; i++) {
String result = getNextResult(completionService);
if (result != null)
System.out.println(result);
}
executionService.shutdown();
}
private static String getNextResult(CompletionService<String> completionService) {
Future<String> result = null;
while (result == null) {
try {
result = completionService.take();
} catch (InterruptedException e) {
// ignore and retry
}
}
try {
return result.get();
} catch (ExecutionException e) {
e.printStackTrace();
return null;
} catch (InterruptedException e) {
e.printStackTrace();
return null;
}
}
static class PartialResultTask implements Callable<String> {
private int n;
public PartialResultTask(int n) {
this.n = n;
}
@Override
public String call() {
return String.format("Partial result %d", n);
}
}
}