i'm tasked to iterate over all links+sublinks of the given web portal. In most cases , when the web pages are not too complex and big i dont have any problems. The problem starts when i check links of a really complex site such as tutorialspoint and my computer just crash. I can't find any performance issue in code i attached, so can someone experienced tell me where in my code is a possible threat, where my computer crashes?
uniqueLinks collection is a HashSet for best perfomance for using contains.
private void recursiveLinkSearch(String webPage) {
/** ignore pdf**/
try {
logger.info(webPage);
uniqueLinks.add(webPage);
Document doc = Jsoup.connect(webPage).get();
doc.select("a").forEach(record->{
String url=record.absUrl("href");
if(!uniqueLinks.contains(url)) {
/** this would not allow me to to recursively acces to link from other domain **/
if(url.contains(getWebPortalDomain())) {
recursiveLinkSearch(url);
}
}
});
} catch (IOException e) {
e.printStackTrace();
}
}
答案 0 :(得分:1)
我假设您不字面意思是您的计算机崩溃了。我认为你的意思是你的应用程序崩溃了,我希望它是由StackOverflowError
引起的。
Java中的递归有一个基本的限制。如果一个线程过于递归,它将填满它的堆栈,你得到一个StackOverflowError
。您可以通过使用更大的线程堆栈来解决此问题(在某些情况下),但这只适用于某一点。
在这种情况下,您应该做的是将递归问题转变为迭代问题。例如:
执行此操作的简单方法是使用具有有界工作池的ExecutorService。这也照顾了队列管理。