Question

我有以下方法，我在多线程执行中从我的map任务运行，但是这在一个独立的模型中运行良好，但是当我在Hadoop YARN中运行它时，它耗尽了1GB的物理内存和虚拟内存记忆也迸发出来。

我需要知道从编程的角度来看我是否做错了什么，我想我正在关闭所有我正在打开的流，所以我认为没有理由发生内存泄漏。请指教。

感谢。

public static void manageTheCurrentURL（String url）{

logger.trace("Entering the method manageTheCurrentURL ");

InputStream stream = null;
InputStream is = null;
ByteArrayOutputStream out = null;
WebDriver driver = null;
try {

    if (StringUtils.isNotBlank(url)) {

        caps.setJavascriptEnabled(true); // not really needed: JS
                                            // enabled by default
        caps.setCapability(
                PhantomJSDriverService.PHANTOMJS_EXECUTABLE_PATH_PROPERTY,
                "/usr/local/bin/phantomjs");

        // Launch driver (will take care and ownership of the phantomjs
        // process)
        driver = new PhantomJSDriver(caps);
        driver.get(url);
        String htmlContent = driver.getPageSource();

        if (htmlContent != null) {

            is = new ByteArrayInputStream(htmlContent.getBytes());

            ByteArrayDocumentSource byteArrayDocumentSource = new ByteArrayDocumentSource(
                    is, url, "text/html");

            Any23 runner = new Any23();
            runner.setHTTPUserAgent("test-user-agent");

            out = new ByteArrayOutputStream();
            TripleHandler handler = new NTriplesWriter(out);

            try {
                runner.extract(byteArrayDocumentSource, handler);
            } catch (ExtractionException e) {


            } finally {

                if (driver != null) {
                    driver.quit();
                    //driver.close();
                }

                try {
                    handler.close();

                } catch (TripleHandlerException e) {

                }
                if (is != null) {
                    try {
                        is.close();
                    } catch (IOException e) {
                    }
                }

            }

            if (out != null) {

                stream = new ByteArrayInputStream(out.toByteArray());
                Iterator<Node[]> it = new DeltaParser(stream);
                if (it != null) {

                    SolrCallbackForNXParser callback = new SolrCallbackForNXParser(
                            url);
                    callback.startStory();

                    while (it.hasNext()) {
                        Node[] abc = it.next();
                        callback.processStory(abc);
                    }

                    callback.endStory();
                }
            }

        }

    }

} catch (IOException e) {
    return;
}

finally {

    if (stream != null) {
        try {
            stream.close();
        } catch (IOException e) {
        }
    }
    if (out != null) {
        try {
            out.close();
        } catch (IOException e) {
        }

    }
}

logger.trace("Exiting the method manageTheCurrentURL ");

}

Hadoop YARN Map任务耗尽物理和虚拟内存

0 个答案: