需要多线程帮助

时间:2016-02-10 11:46:29

标签: java multithreading

我正在创建一个类似于抓取工具的抓取工具,可以在网页中找到图像。在这里,生产者生成链接,消费者连接到该链接以查找图像,但由于消费者产生了大量链接,消费者花费了大量时间。所以我将消费者放在执行者服务中,但我找不到消费者所花费的时间。请帮助我。以下是我的代码。

@Service
@Qualifier("crawlerService")
public class CrawlerService {

@Autowired
@Qualifier("loggerService")
LoggerService loggerService;

@Autowired
@Qualifier("imageTypeExtensionCombo")
ImageTypeExtensionCombo imageTypeExtensionCombo;

public List<String> startCrawler(List<String> links, List<String> images, URL url, String protocol, String protocolHost) throws Exception{
    LinkQueue queue = new LinkQueue(links);
    LinkProducer producer = new LinkProducer(links, url, protocol, protocolHost, queue, loggerService);
    LinkConsumer consumer = new LinkConsumer(links, images, url, protocol, protocolHost, loggerService, queue);
    ExecutorService executorService = Executors.newFixedThreadPool(4);
    executorService.submit(consumer);
    producer.start();
    //consumer.start();
    Thread.currentThread().join();
    executorService.shutdown();
    return images;
  }
}

LinkProducer类

public class LinkProducer extends Thread {

    private List<String> anchorList;
    private URL url;
    private String protocol;
    private String protocolHost;
    private UrlValidator urlValidator = new UrlValidator();
    private LinkQueue queue;
    private LoggerService loggerService;
    private int MAX_QUEUE_SIZE = 2;
    private int counter = 0;
    private boolean stopThread = false;

    private String HTML_TYPE = "HTML";
    private String HTML_CONTENT_TYPE = "text/html";
    private String IMAGE_TYPE = "IMAGE";
    private String NON_HTML_NON_IMAGE_TYPE = "OTHERS";

    public LinkProducer(List<String> anchorList, URL url, String protocol,String protocolHost, LinkQueue queue, LoggerService loggerService) {

        super(protocolHost.replace(protocol, "").replaceAll("/", ""));
        this.anchorList = anchorList;
        this.url = url;
        this.protocol = protocol;
        this.protocolHost = protocolHost;
        this.queue = queue;
        this.loggerService = loggerService;

    }


    public void run() {
        int i = 0;
        while(true) {
            List<String> anchors = null;
            loggerService.log("Producer Thread : " + (++i));
            try {
                anchors = produce();
            } catch (Exception ex) {
                loggerService.log("Exception occured in producer thread : "+ ex.getMessage());
                ex.printStackTrace();
                if(stopThread){
                    break;
                }
            }
            if(stopThread){
                break;
            }
            if(anchors != null && anchors.size() > 0){
                Iterator<String> iter = anchors.iterator();
                while(iter.hasNext()){
                    synchronized (queue) {
                        queue.enQueue(iter.next());
                    }
                }
            }
        }
    }
 }

LinkConsumer类

public class LinkConsumer extends Thread {

    private List<String> anchorList;
    private List<String> imageList;
    private URL url;
    private String protocol;
    private String protocolHost;
    private LinkQueue queue;
    private LoggerService loggerService;
    private UrlValidator urlValidator = new UrlValidator();

    private String HTML_TYPE = "HTML";

    private String HTML_CONTENT_TYPE = "text/html";

    private String IMAGE_TYPE = "IMAGE";

    private String NON_HTML_NON_IMAGE_TYPE = "OTHERS";

    public LinkConsumer(List<String> anchorList, List<String> imageList, URL url, String protocol,String protocolHost, LoggerService loggerService, LinkQueue queue) {

        super(protocolHost.replace(protocol, "").replaceAll("/", ""));
        this.anchorList = anchorList;
        this.imageList = imageList;
        this.url = url;
        this.protocol = protocol;
        this.protocolHost = protocolHost;
        this.queue = queue;
        this.loggerService = loggerService;
    }

    public void run() {
        int  i = 0;
        while (!queue.isEmpty()) {
            List<String> images = null;
            loggerService.log("Consumer Thread : " + (++i));
            try {
                images = consume();
            } catch (Exception ex) {
                loggerService.log("Exception occured in consumer thread : "+ ex.getMessage());
                ex.printStackTrace();
            }
            if (images != null && images.size() > 0) {
                Iterator<String> iter = images.iterator();
                while (iter.hasNext()) {
                    imageList.add(iter.next());
                }
            }
        }
    }
 }

由于

3 个答案:

答案 0 :(得分:2)

您只创建并提交一个LinkConsumer,因此您只有一名工作人员。

要实现真正的并行效果,您需要创建并提交更多LinkConsumer

答案 1 :(得分:1)

多线程并没有给你带来很多好处。事实上,当您创建太多线程并且您的硬件不足以处理这些线程时,它会增加复杂性。

多线程只有在您有效使用它时才能获得显着的收益。如果您继续以这种方式创建线程,那么您将无法获得任何性能提升。

您的硬件,尤其是处理器规格以及您写入磁盘的数据量是主要限制因素,这将决定您将获得的性能。

我建议如下。 有多台机器。作为生产者的一台机器将所有URL或图像或您想要的内容写入数据库。客户端系统从DB获取URL并从源获取数据。

从技术上讲,你有多个系统在工作,每台机器一次可以有~10个活动线程。而且您只需要编码一次并在多台计算机上运行相同的代码。您也可以使用与消费者相同的生产者机器。

答案 2 :(得分:1)

您可以尝试这样的方法来创建新线程。但我不确定创建新线程会增加太多时间。您还需要更好的硬件。

  public boolean secondThread(){
    Thread t = new Thread(){
        public void run(){


        //do somehting 

        }
    };
    t.start();
    return true;
}