使用 Tika 导入 Solr 时遇到困难,我的文档在索引网页时会一直崩溃。
我正在删除Tika文档的内容并重新启动导入,但这非常繁琐,我显然丢失了这些文档的内容。
以下是崩溃日志:
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to read content Processing Document # 927
at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:130)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:596)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:622)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@b623d7
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:137)
at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:128)
... 8 more
Caused by: java.lang.NullPointerException
Nov 10, 2011 10:51:29 AM org.apache.solr.common.SolrException log
SEVERE: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to read content Processing Document # 927
at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:130)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:596)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:622)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@b623d7
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:137)
at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:128)
... 8 more
Caused by: java.lang.NullPointerException
崩溃的数据示例:
pageText=pageText(1.0)={<table width="100%" height="100%" border="0" cellpadding="0" cellspacing="0" nodeIndex="3" class="ril_layoutTable">
<tr nodeIndex="2">
<td width="50%" rowspan="3" nodeIndex="1"> </td>
<td width="1" rowspan="3" nodeIndex="4"></td>
<td nodeIndex="5">
<!-- ImageReady Slices (headergraphics.psd) -->
<table width="780" border="0" cellpadding="0" cellspacing="0" nodeIndex="8" class="ril_layoutTable">
<tr nodeIndex="7">
<td colspan="9" nodeIndex="6">
<table width="780" height="40" border="0" cellpadding="0" cellspacing="0" nodeIndex="11" class="ril_layoutTable">
<tr nodeIndex="10">
<td width="500" nodeIndex="9"> </td>
<td width="135" nodeIndex="12">
<a href="/login.html" nodeIndex="80"></a>
<a href="/login.html" nodeIndex="81"></a>
</td>
<td width="135" nodeIndex="13"> </td>
<td nodeIndex="14"> </td>
</tr>
</table>
</td>
</tr>
<tr nodeIndex="16">
<td nodeIndex="15"></td>
<td nodeIndex="17" childIsOnlyALink="1">
<a href="/index.html" nodeIndex="84"></a>
</td>
<td nodeIndex="18" childIsOnlyALink="1">
<a href="/history.html" nodeIndex="86"></a>
</td>
<td nodeIndex="19" childIsOnlyALink="1">
<a href="/faq.html" nodeIndex="88"></a>
</td>
<td nodeIndex="20" childIsOnlyALink="1">
<a href="/prep.html" nodeIndex="90"></a>
</td>
<td nodeIndex="21"></td>
<td nodeIndex="22" childIsOnlyALink="1">
<a href="/exercises.html" nodeIndex="93"></a>
</td>
<td nodeIndex="23" childIsOnlyALink="1">
<a href="/faq.html?contact=true" nodeIndex="95"></a>
</td>
<td nodeIndex="24"></td>
</tr>
<tr nodeIndex="26">
<td colspan="9" nodeIndex="25"></td>
</tr>
</table><!-- End ImageReady Slices -->
</td>
<td width="1" rowspan="3" nodeIndex="27"></td>
<td width="50%" rowspan="3" nodeIndex="28"> </td>
</tr>
<tr nodeIndex="30">
<td height="100%" valign="top" nodeIndex="29">
<table width="780" border="0" cellpadding="0" cellspacing="0" nodeIndex="33" class="ril_layoutTable">
<tr nodeIndex="32">
<td width="534" valign="top" nodeIndex="31">
<table width="534" border="0" cellpadding="0" cellspacing="0" nodeIndex="36" class="ril_layoutTable">
<tr nodeIndex="35">
<td width="534" valign="top" class="bgdown" nodeIndex="34">
<table cellspacing="0" cellpadding="0" nodeIndex="39" class="ril_layoutTable">
<tr nodeIndex="38">
<td valign="top" width="508" nodeIndex="37">
<!--Begin Content-->
<h2 nodeIndex="40">Welcome to IQTest.com, home of the original online IQ test.</h2>
<p nodeIndex="41" childIsOnlyALink="1">
<a href="/prep.html" nodeIndex="100">Click here</a> to take our free, private, and fun IQ test.</p>
<p nodeIndex="42">
Our original IQ test is the most scientifically valid IQ test available on
the web today. Previously offered only to corporations, schools, and in certified professional applications, it is now available to you. In addition to measuring your general IQ, our exclusive test assesses your performance in 13 different areas of intelligence, revealing your key cognizant
strengths and weaknesses.</p>
<p nodeIndex="43">
Developed by PhDs and statistically sound, our test reflects the best research available.<br nodeIndex="101">
<a href="/prep.html" nodeIndex="102">Click here to begin</a>
<br nodeIndex="103">
<br nodeIndex="104">
</p>
<h2 nodeIndex="44">
<a href="/prep.html" nodeIndex="105">IQTest.com<br nodeIndex="106">
Take the Test</a>
</h2>
<br nodeIndex="107">
<h2 nodeIndex="45">
<strong nodeIndex="108">What is an IQ?
</strong>
</h2>
<p nodeIndex="46">An Intelligence Quotient indicates a person's mental abilities relative to others of approximately the same age. Everyone has hundreds of specific mental
abilities--some can be measured accurately and are reliable predictors of academic and financial success.</p>
<p nodeIndex="47">Read more about <a href="whatisaniqscore.html" nodeIndex="109">Intelligence Testing</a></p>
<!-- End of StatCounter Code -->
<!--End Content-->
<br nodeIndex="113">
<p nodeIndex="48"></p>
</td>
</tr>
</table><!-- </div> -->
</td>
</tr>
<tr nodeIndex="50">
<td nodeIndex="49"></td>
</tr>
</table>
</td>
<!--Begin Sidebar-->
<td height="100%" nodeIndex="51"> </td>
<td width="225" valign="top" nodeIndex="52">
<table class="ril_layoutTable" width="225" border="0" cellpadding="0" cellspacing="0" nodeIndex="55">
<tr nodeIndex="54">
<td nodeIndex="53"></td>
</tr>
<tr nodeIndex="57">
<td width="225" valign="top" nodeIndex="56">
<h4 nodeIndex="118">What does my score mean?</h4>
<p nodeIndex="58">Please <a href="whatisaniqscore.html" nodeIndex="119">click here</a> for an explanation of IQ testing and standard deviation.<br nodeIndex="120">
Please <a href="faq.html#chart" nodeIndex="121">click here</a> for a test score comparison chart.<br nodeIndex="122">
Please <a href="history.html" nodeIndex="123">click here</a> for a history of intelligence testing.</p>
<div align="center" margin="0" nodeIndex="59">
</div>
</td>
</tr>
<tr nodeIndex="61">
<td nodeIndex="60"></td>
</tr>
<tr nodeIndex="63">
<td width="225" valign="top" nodeIndex="62">
<h4 nodeIndex="127">What is the Complete Personal Intelligence Profile?</h4>
<p nodeIndex="64">Your Complete Personal Intelligence Profile will give you much greater detail about the range and variety of your mental abilities. <a href="profileexplain.html" nodeIndex="128">Read More...</a></p>
</td>
</tr>
<tr nodeIndex="66">
<td nodeIndex="65"></td>
</tr>
<tr nodeIndex="68">
<td width="225" valign="top" nodeIndex="67">
<h4 nodeIndex="130">Consciousness Exercises</h4>
<p nodeIndex="69">The Consciousness Exercises are a set of entertaining psycho-spiritual games, puzzles, dialogs, and more, which can expand your awareness. <a href="exercises.html" nodeIndex="131">Read More...</a></p>
</td>
</tr>
<tr nodeIndex="71">
<td nodeIndex="70"></td>
</tr>
</table>
</td>
<!--End Sidebar-->
</tr>
</table>
</td>
</tr>
<tr nodeIndex="73">
<td nodeIndex="72">
<table width="780" border="0" cellpadding="0" cellspacing="0" nodeIndex="76" class="ril_layoutTable">
<tr nodeIndex="75">
<td width="780" height="33" align="center" nodeIndex="74">
<a href="/index.html" nodeIndex="133">Home</a>
<a href="/history.html" nodeIndex="134">History</a>
<a href="/faq.html" nodeIndex="135">FAQ</a>
<a href="/prep.html" nodeIndex="136">Test</a>
<a href="/exercises.html" nodeIndex="137">Consciousness Exercises</a>
<a href="/faq.html?contact=true" nodeIndex="138">Contact Us</a>
<a href="/privacy.html" nodeIndex="139">Privacy Policy</a>
<a href="/remove.html" nodeIndex="140">Unsubscribe</a>
</td>
</tr>
<tr nodeIndex="78">
<td width="780" height="34" align="center" nodeIndex="77">© 2003 -2011 Autumn Group. All rights reserved</td>
</tr>
</table>
</td>
</tr>