我有一个要求,我会点击链接并获得回复。响应是包含子链接的XML数据。然后将响应复制到文件中,并将子链接添加到队列中,然后迭代地必须点击子链接,直到没有其他子项为止。
我首先使用单个队列执行此操作。但由于它很慢,我试图实现一个执行程序。我不必维护数据的顺序。这是我的方法:
public class Hierarchy2 {
private static AbstractQueue<String> queue = new ConcurrentLinkedQueue<>();
private static FileWriter writer;
private static SAXParser saxParser;
private static XMLHandler xmlHandler = new XMLHandler();
public static void main(String[] args) throws IOException, ParserConfigurationException, SAXException {
writer = new FileWriter(new File("hierarchy.txt"));
String baseUrl = "my url here";
queue.add(baseUrl);
int threadCount = Runtime.getRuntime().availableProcessors() + 1;
ExecutorService executor = Executors.newFixedThreadPool(threadCount);
for (int i = 0; i < threadCount; i++) {
executor.execute(new QueueProcess(queue, writer, xmlHandler));
}
executor.shutdown();
}
}
class QueueProcess implements Runnable {
private AbstractQueue<String> queue;
private HttpURLConnection connection;
private URL url;
private FileWriter writer;
private SAXParserFactory factory = SAXParserFactory.newInstance();
private SAXParser saxParser;
private XMLHandler xmlHandler;
public QueueProcess(AbstractQueue<String> queue, FileWriter writer, XMLHandler xmlHandler) {
this.queue = queue;
this.writer = writer;
this.xmlHandler = xmlHandler;
}
@Override
public void run() {
try {
saxParser = factory.newSAXParser();
while (true) {
String link = queue.poll();
if (link != null) {
if (queue.size() >= 500) {
System.out.println("here" + " " + Thread.currentThread().getName());
getChildLinks(link);
} else {
System.out.println(link + " " + Thread.currentThread().getName());
queue.addAll(getChildLinks(link));
}
}
}
} catch (IOException | SAXException | ParserConfigurationException e) {
e.printStackTrace();
}
}
private List<String> getChildLinks(String link) throws IOException, SAXException {
url = new URL(link);
connection = (HttpURLConnection) url.openConnection();
connection.connect();
String result = new BufferedReader(new InputStreamReader(connection.getInputStream())).lines()
.collect(Collectors.joining());
saxParser.parse(new ByteArrayInputStream(result.getBytes()), xmlHandler);
List<String> urlList = xmlHandler.getURLList();
writer.write(result + System.lineSeparator());
connection.disconnect();
return urlList;
}
}
程序运行正常,但在某些时候我得到一个空指针异常。它位于queue.addAll
QueueProcess'
方法的run
行。
例外:
Exception in thread "pool-1-thread-3" java.lang.NullPointerException
at java.util.concurrent.ConcurrentLinkedQueue.checkNotNull(ConcurrentLinkedQueue.java:914)
at java.util.concurrent.ConcurrentLinkedQueue.addAll(ConcurrentLinkedQueue.java:525)
at QueueProcess.run(Hierarchy2.java:77)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "pool-1-thread-1" java.lang.NullPointerException
at java.util.concurrent.ConcurrentLinkedQueue.checkNotNull(ConcurrentLinkedQueue.java:914)
at java.util.concurrent.ConcurrentLinkedQueue.addAll(ConcurrentLinkedQueue.java:525)
at QueueProcess.run(Hierarchy2.java:77)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
我无法弄清楚为什么会有NPE,因为我会在while循环的每次迭代之前检查队列是否为空。请告诉我为什么我会得到一个空指针异常,并可以防止这种情况发生。
更新
我终于修复了NPE。正如 @ gusto2 所建议的那样,这是由于arraylist包含一个队列不接受的空值。
现在我的代码就像这样:
public class Hierarchy2 {
private static BlockingQueue<String> queue = new LinkedBlockingQueue<>();
private static FileWriter writer;
private static XMLHandler xmlHandler = new XMLHandler();
public static void main(String[] args) throws IOException, ParserConfigurationException, SAXException {
writer = new FileWriter(new File("hierarchy.txt"));
String baseUrl = "my url here";
queue.add(baseUrl);
int threadCount = Runtime.getRuntime().availableProcessors() + 1;
ExecutorService executor = Executors.newFixedThreadPool(threadCount);
for (int i = 0; i < threadCount; i++) {
executor.execute(new QueueProcess(queue, writer, xmlHandler));
}
executor.shutdown();
}
}
class QueueProcess implements Runnable {
private BlockingQueue<String> queue;
private HttpURLConnection connection;
private URL url;
private FileWriter writer;
private SAXParserFactory factory = SAXParserFactory.newInstance();
private SAXParser saxParser;
private XMLHandler xmlHandler = new XMLHandler();
public QueueProcess(BlockingQueue<String> queue, FileWriter writer, XMLHandler xmlHandler) {
this.queue = queue;
this.writer = writer;
}
@Override
public void run() {
try {
saxParser = factory.newSAXParser();
while (true) {
String link = queue.poll();
if (link != null) {
System.out.println(link + " " + Thread.currentThread().getName());
queue.addAll(getChildLinks(link));
}
}
} catch (IOException | SAXException | ParserConfigurationException e) {
e.printStackTrace();
}
}
private List<String> getChildLinks(String link) throws IOException, SAXException {
url = new URL(link);
connection = (HttpURLConnection) url.openConnection();
connection.connect();
String result = new BufferedReader(new InputStreamReader(connection.getInputStream())).lines()
.collect(Collectors.joining());
saxParser.parse(new ByteArrayInputStream(result.getBytes()), xmlHandler);
List<String> urlList = xmlHandler.getURLList();
writer.write(result + System.lineSeparator());
connection.disconnect();
return urlList;
}
}
现在的问题是当线程TOGETHER处理了500条记录时暂停线程。一旦达到500,我将不得不创建另一个文件,然后再次开始我的处理。
另请告诉我如何在完全读取所有队列后停止代码。即。不再有子链接添加到队列中。由于我使用的是一个始终为true的while循环,因此代码将无限期地运行。如果我使用条件while(!queue.isEmpty())
,则只有一个线程将运行,而其他线程将发现队列为空。
答案 0 :(得分:1)
Exception in thread "pool-1-thread-1" java.lang.NullPointerException
at java.util.concurrent.ConcurrentLinkedQueue.checkNotNull(ConcurrentLinkedQueue.java:914)
at java.util.concurrent.ConcurrentLinkedQueue.addAll(ConcurrentLinkedQueue.java:525)
我猜测List<String> urlList = xmlHandler.getURLList();
会返回一个内部有一些空值的ArrayList。虽然没有更多的信息,但很难说更准确
答案 1 :(得分:0)
这里只在队列中添加一次基本URL。它不在循环中。
queue.add(baseUrl);
int threadCount = Runtime.getRuntime().availableProcessors() + 1;
ExecutorService executor = Executors.newFixedThreadPool(threadCount);
for (int i = 0; i < threadCount; i++) {
executor.execute(new QueueProcess(queue, writer, xmlHandler));
}
因此,当您致电QueueProcess(queue, writer, xmlHandler)
时,您会传递一个条目。然后当你调用String link = queue.poll();
时,它只删除一个附加值。如果您在queue.size() >= 500
for for循环中添加了单个值的队列,QueueProcess(queue, writer, xmlHandler)
怎么可以呢?