同步访问队列

时间:2017-09-19 10:09:26

标签: java multithreading

我有一个要求,我会点击链接并获得回复。响应是包含子链接的XML数据。然后将响应复制到文件中,并将子链接添加到队列中,然后迭代地必须点击子链接,直到没有其他子项为止。

我首先使用单个队列执行此操作。但由于它很慢,我试图实现一个执行程序。我不必维护数据的顺序。这是我的方法:

 public class Hierarchy2 {

    private static AbstractQueue<String> queue = new ConcurrentLinkedQueue<>();
    private static FileWriter writer;

    private static SAXParser saxParser;
    private static XMLHandler xmlHandler = new XMLHandler();

    public static void main(String[] args) throws IOException, ParserConfigurationException, SAXException {
        writer = new FileWriter(new File("hierarchy.txt"));
        String baseUrl = "my url here";

        queue.add(baseUrl);

        int threadCount = Runtime.getRuntime().availableProcessors() + 1;
        ExecutorService executor = Executors.newFixedThreadPool(threadCount);

        for (int i = 0; i < threadCount; i++) {
            executor.execute(new QueueProcess(queue, writer, xmlHandler));
        }

        executor.shutdown();

    }
}

class QueueProcess implements Runnable {

    private AbstractQueue<String> queue;
    private HttpURLConnection connection;
    private URL url;
    private FileWriter writer;
    private SAXParserFactory factory = SAXParserFactory.newInstance();
    private SAXParser saxParser;
    private XMLHandler xmlHandler;

    public QueueProcess(AbstractQueue<String> queue, FileWriter writer, XMLHandler xmlHandler) {
        this.queue = queue;
        this.writer = writer;

        this.xmlHandler = xmlHandler;
    }

    @Override
    public void run() {
        try {
            saxParser = factory.newSAXParser();
            while (true) {
                String link = queue.poll();
                if (link != null) {
                    if (queue.size() >= 500) {
                        System.out.println("here" + "     " + Thread.currentThread().getName());
                        getChildLinks(link);
                    } else {
                        System.out.println(link + "     " + Thread.currentThread().getName());
                        queue.addAll(getChildLinks(link));
                    }
                }
            }
        } catch (IOException | SAXException | ParserConfigurationException e) {
            e.printStackTrace();
        }

    }

    private List<String> getChildLinks(String link) throws IOException, SAXException {
        url = new URL(link);
        connection = (HttpURLConnection) url.openConnection();
        connection.connect();

        String result = new BufferedReader(new InputStreamReader(connection.getInputStream())).lines()
                .collect(Collectors.joining());

        saxParser.parse(new ByteArrayInputStream(result.getBytes()), xmlHandler);
        List<String> urlList = xmlHandler.getURLList();

        writer.write(result + System.lineSeparator());

        connection.disconnect();
        return urlList;
    }

}

程序运行正常,但在某些时候我得到一个空指针异常。它位于queue.addAll QueueProcess'方法的run行。

例外:

Exception in thread "pool-1-thread-3" java.lang.NullPointerException
    at java.util.concurrent.ConcurrentLinkedQueue.checkNotNull(ConcurrentLinkedQueue.java:914)
    at java.util.concurrent.ConcurrentLinkedQueue.addAll(ConcurrentLinkedQueue.java:525)
    at QueueProcess.run(Hierarchy2.java:77)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Exception in thread "pool-1-thread-1" java.lang.NullPointerException
    at java.util.concurrent.ConcurrentLinkedQueue.checkNotNull(ConcurrentLinkedQueue.java:914)
    at java.util.concurrent.ConcurrentLinkedQueue.addAll(ConcurrentLinkedQueue.java:525)
    at QueueProcess.run(Hierarchy2.java:77)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

我无法弄清楚为什么会有NPE,因为我会在while循环的每次迭代之前检查队列是否为空。请告诉我为什么我会得到一个空指针异常,并可以防止这种情况发生。

更新

我终于修复了NPE。正如 @ gusto2 所建议的那样,这是由于arraylist包含一个队列不接受的空值。

现在我的代码就像这样:

public class Hierarchy2 {

    private static BlockingQueue<String> queue = new LinkedBlockingQueue<>();
    private static FileWriter writer;
    private static XMLHandler xmlHandler = new XMLHandler();

    public static void main(String[] args) throws IOException, ParserConfigurationException, SAXException {
        writer = new FileWriter(new File("hierarchy.txt"));
        String baseUrl = "my url here";

        queue.add(baseUrl);

        int threadCount = Runtime.getRuntime().availableProcessors() + 1;
        ExecutorService executor = Executors.newFixedThreadPool(threadCount);

        for (int i = 0; i < threadCount; i++) {
            executor.execute(new QueueProcess(queue, writer, xmlHandler));
        }

        executor.shutdown();

    }
}

class QueueProcess implements Runnable {

    private BlockingQueue<String> queue;
    private HttpURLConnection connection;
    private URL url;
    private FileWriter writer;
    private SAXParserFactory factory = SAXParserFactory.newInstance();
    private SAXParser saxParser;
    private XMLHandler xmlHandler = new XMLHandler();

    public QueueProcess(BlockingQueue<String> queue, FileWriter writer, XMLHandler xmlHandler) {
        this.queue = queue;
        this.writer = writer;
    }

    @Override
    public void run() {
        try {
            saxParser = factory.newSAXParser();
            while (true) {
                String link = queue.poll();
                if (link != null) {
                    System.out.println(link + "     " + Thread.currentThread().getName());
                    queue.addAll(getChildLinks(link));
                }
            }
        } catch (IOException | SAXException | ParserConfigurationException e) {
            e.printStackTrace();
        }

    }

    private List<String> getChildLinks(String link) throws IOException, SAXException {
        url = new URL(link);
        connection = (HttpURLConnection) url.openConnection();
        connection.connect();

        String result = new BufferedReader(new InputStreamReader(connection.getInputStream())).lines()
                .collect(Collectors.joining());

        saxParser.parse(new ByteArrayInputStream(result.getBytes()), xmlHandler);
        List<String> urlList = xmlHandler.getURLList();

        writer.write(result + System.lineSeparator());

        connection.disconnect();
        return urlList;
    }

}

现在的问题是当线程TOGETHER处理了500条记录时暂停线程。一旦达到500,我将不得不创建另一个文件,然后再次开始我的处理。

另请告诉我如何在完全读取所有队列后停止代码。即。不再有子链接添加到队列中。由于我使用的是一个始终为true的while循环,因此代码将无限期地运行。如果我使用条件while(!queue.isEmpty()),则只有一个线程将运行,而其他线程将发现队列为空。

2 个答案:

答案 0 :(得分:1)

Exception in thread "pool-1-thread-1" java.lang.NullPointerException 
at java.util.concurrent.ConcurrentLinkedQueue.checkNotNull(ConcurrentLinkedQueue.java:914) 
at  java.util.concurrent.ConcurrentLinkedQueue.addAll(ConcurrentLinkedQueue.java:525)

我猜测List<String> urlList = xmlHandler.getURLList();会返回一个内部有一些空值的ArrayList。虽然没有更多的信息,但很难说更准确

答案 1 :(得分:0)

这里只在队列中添加一次基本URL。它不在循环中。

    queue.add(baseUrl);

    int threadCount = Runtime.getRuntime().availableProcessors() + 1;
    ExecutorService executor = Executors.newFixedThreadPool(threadCount);

    for (int i = 0; i < threadCount; i++) {
        executor.execute(new QueueProcess(queue, writer, xmlHandler));
    }

因此,当您致电QueueProcess(queue, writer, xmlHandler)时,您会传递一个条目。然后当你调用String link = queue.poll();时,它只删除一个附加值。如果您在queue.size() >= 500 for for循环中添加了单个值的队列,QueueProcess(queue, writer, xmlHandler)怎么可以呢?