如何在不抛出Java_heap_space错误的情况下更新新的Solr搜索集合?

时间:2012-12-04 22:14:04

标签: coldfusion solr coldfusion-10 verity

我们目前正在运行ColdFusion 8,但计划很快转向ColdFusion 10。这一举措最大的问题之一是我们运行的最重要的应用程序之一包括目前使用Verity Collections构建的全文文档搜索。它基本上允许用户搜索数百个PDF文档的文本内容。

我刚刚在我的开发ColdFusion 9实例中创建了一个新的Solr Collection,并尝试使用现有的索引逻辑更新集合,该逻辑每天运行以使用存储在本地服务器上的PDF文档F:\PDFS\[documentId].PDF来更新集合:

<cfsetting requesttimeout="3600" />

<cfquery name="getDocs" datasource="myDB">
    SELECT DISTINCT
        itemNo,
        edition,
        description,
        status,
        'F:\PDFs\'
            CONCAT documentId
            CONCAT '.PDF'   AS  document_file
    FROM    SKU_ATTRIBUTES
</cfquery>

<cfindex
    query="getDocs"
    collection="mysolrcollection"
    action="refresh"
    type="file"
    key="document_file"
    title="description"
    custom1="itemNo"
    custom2="status"
    custom3="edition" />

它跑了大约10分钟,然后遭到以下例外的轰炸:

Java_heap_space__javalangOutOfMemoryError_Java_heap_space___at_orgapacheluceneutilUnicodeUtilUTF16toUTF8UnicodeUtiljava236___at_orgapachelucenestoreIndexOutputwriteStringIndexOutputjava102___at_orgapacheluceneindexFieldsWriterwriteFieldFieldsWriterjava232___at_orgapacheluceneindexStoredFieldsWriterPerFieldprocessFieldsStoredFieldsWriterPerFieldjava56___at_orgapacheluceneindexDocFieldConsumersPerFieldprocessFieldsDocFieldConsumersPerFieldjava37___at_orgapacheluceneindexDocFieldProcessorPerThreadprocessDocumentDocFieldProcessorPerThreadjava234___at_orgapacheluceneindexDocumentsWriterupdateDocumentDocumentsWriterjava762___at_orgapacheluceneindexDocumentsWriterupdateDocumentDocumentsWriterjava745___at_orgapacheluceneindexIndexWriterupdateDocumentIndexWriterjava2215___at_orgapacheluceneindexIndexWriterupdateDocumentIndexWriterjava2187___at_orgapachesolrupdateDirectUpdateHandler2addDocDirectUpdateHandler2java238___at_orgapachesolrupdateprocessorRunUpdateProcessorprocessAddRunUpdateProcessorFactoryjava60___at_orgapachesolrhandlerXMLLoaderprocessUpdateXMLLoaderjava140___at_orgapachesolrhandlerXMLLoaderloadXMLLoaderjava69___at_orgapachesolrhandlerContentStreamHandlerBasehandleRequestBodyContentStreamHandlerBasejava54___at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava131___at_orgapachesolrcoreSolrCoreexecuteSolrCorejava1333___at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava303___at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava232___at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089___at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365___at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216___at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181___at_orgmortbayjettyhan

Java_heap_space__javalangOutOfMemoryError_Java_heap_space___at_orgapacheluceneutilUnicodeUtilUTF16toUTF8UnicodeUtiljava236___at_orgapachelucenestoreIndexOutputwriteStringIndexOutputjava102___at_orgapacheluceneindexFieldsWriterwriteFieldFieldsWriterjava232___at_orgapacheluceneindexStoredFieldsWriterPerFieldprocessFieldsStoredFieldsWriterPerFieldjava56___at_orgapacheluceneindexDocFieldConsumersPerFieldprocessFieldsDocFieldConsumersPerFieldjava37___at_orgapacheluceneindexDocFieldProcessorPerThreadprocessDocumentDocFieldProcessorPerThreadjava234___at_orgapacheluceneindexDocumentsWriterupdateDocumentDocumentsWriterjava762___at_orgapacheluceneindexDocumentsWriterupdateDocumentDocumentsWriterjava745___at_orgapacheluceneindexIndexWriterupdateDocumentIndexWriterjava2215___at_orgapacheluceneindexIndexWriterupdateDocumentIndexWriterjava2187___at_orgapachesolrupdateDirectUpdateHandler2addDocDirectUpdateHandler2java238___at_orgapachesolrupdateprocessorRunUpdateProcessorprocessAddRunUpdateProcessorFactoryjava60___at_orgapachesolrhandlerXMLLoaderprocessUpdateXMLLoaderjava140___at_orgapachesolrhandlerXMLLoaderloadXMLLoaderjava69___at_orgapachesolrhandlerContentStreamHandlerBasehandleRequestBodyContentStreamHandlerBasejava54___at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava131___at_orgapachesolrcoreSolrCoreexecuteSolrCorejava1333___at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava303___at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava232___at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089___at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365___at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216___at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181___at_orgmortbayjettyhan request: http://localhost:8983/solr/mysolrcollection/update?commit=true&waitFlush=false&waitSearcher=false&wt=javabin 

当我在ColdFusion Administrator中查看Solr Collection时,它比原始的Verity Collection大得多 - 现有的Verity Collection大约84-85MB,包含9000+个文档,而这个是1.3GB,只有847个文档。

此搜索功能对于应用程序至关重要,我担心如果迁移到Solr不起作用,我们将不得不推迟升级到CF10。

2 个答案:

答案 0 :(得分:0)

这听起来像是一次性导入过程。您是否尝试过将结果批量编译为每次迭代500个文档。根据我的经验,当页面超过1分钟时,Coldfusion表现不佳。

答案 1 :(得分:0)

确保已安装ColdFusion Hotfix 2 for ColdFusion 9.0.1。

Cumulative Hot Fix 2 | ColdFusion 9.0.1

Hotfix包含Solr的一些主要错误修正,特别是在索引.PDF文件时。或者安装ColdFusion 9.0.2,但不再支持Verity。所以你将无法在Verity和Solr之间切换。