Question

i'm tasked to iterate over all links+sublinks of the given web portal. In most cases , when the web pages are not too complex and big i dont have any problems. The problem starts when i check links of a really complex site such as tutorialspoint and my computer just crash. I can't find any performance issue in code i attached, so can someone experienced tell me where in my code is a possible threat, where my computer crashes?

uniqueLinks collection is a HashSet for best perfomance for using contains.

private void recursiveLinkSearch(String webPage) {
        /** ignore pdf**/
        try {
            logger.info(webPage);
            uniqueLinks.add(webPage);
            Document doc = Jsoup.connect(webPage).get();
            doc.select("a").forEach(record->{
                String url=record.absUrl("href");
                if(!uniqueLinks.contains(url)) {
                    /** this would not allow me to to recursively acces to link from other domain **/
                    if(url.contains(getWebPortalDomain())) {
                        recursiveLinkSearch(url);
                    }
                }
            });
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

Answer 1

我假设您不字面意思是您的计算机崩溃了。我认为你的意思是你的应用程序崩溃了，我希望它是由StackOverflowError引起的。

Java中的递归有一个基本的限制。如果一个线程过于递归，它将填满它的堆栈，你得到一个StackOverflowError。您可以通过使用更大的线程堆栈来解决此问题（在某些情况下），但这只适用于某一点。

在这种情况下，您应该做的是将递归问题转变为迭代问题。例如：

使用数据结构来保存等待处理的URL队列。
处理页面并找到需要处理的其他页面的链接时，请将链接添加到队列中。

执行此操作的简单方法是使用具有有界工作池的ExecutorService。这也照顾了队列管理。

Computer crash, when i use recursion to check all html links and sublinks of web

1 个答案: