更新SOLR索引时出现神秘的EOF错误消息

时间:2017-08-23 17:37:59

标签: json solr updates eof

我正在使用原子更新来更新SOLR文档集合中的元数据。为此,我使用外部.json文件,在该文件中记录集合中的所有文档ID和可能的元数据,并使用“set”命令提交请求的更新。但我发现只要外部文件大于约8200字节/ 220行,我就会收到以下错误消息:

“org.apache.solr.common.SolrException:无法解析提供的JSON:意外的EOF:char =(EOF),position = 8191 BEFORE =''”

当我用不同的数据库复制它时,这似乎与文件的实际内容(或可能缺少的括号或其他内容)有关。此外,如果我将外部文件切割成更小,少于8000字节,更新工作完美。有谁知道这可能来自哪里?

更新集合的curl命令如下:

curl'http://localhost:8983/solr/these/update/json?commit=true' - d @ test5.json

邮件后可以使用SOLR主配置文件。如果需要,我可以提供json更新文件。我可以用于任何其他元素。

先谢谢你的帮助,

巴泰勒米

    <?xml version="1.0" encoding="UTF-8" ?>
<!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
-->

<!-- 
 This is a DEMO configuration highlighting elements
 specifically needed to get this example running
 such as libraries and request handler specifics.

 It uses defaults or does not define most of production-level settings
 such as various caches or auto-commit policies.

 See Solr Reference Guide and other examples for
 more details on a well configured solrconfig.xml
 https://cwiki.apache.org/confluence/display/solr/The+Well-Configured+Solr+Instance
-->

<config>
  <!-- Controls what version of Lucene various components of Solr
   adhere to.  Generally, you want to use the latest version to
   get all bug fixes and improvements. It is highly recommended
   that you fully re-index after changing this setting as it can
   affect both how text is indexed and queried.
  -->
  <luceneMatchVersion>6.6.0</luceneMatchVersion>

  <!-- Load Data Import Handler and Apache Tika (extraction) libraries -->
  <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-dataimporthandler-.*\.jar"/>
  <lib dir="${solr.install.dir:../../../..}/contrib/extraction/lib" regex=".*\.jar"/>
  <lib dir="${solr.install.dir:../../../..}/contrib/langid/lib" regex=".*\.jar"/>
  <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-langid-.*\.jar"/>

  <requestHandler name="/select" class="solr.SearchHandler">
    <lst name="defaults">
      <str name="echoParams">explicit</str>
      <str name="df">text</str>
    </lst>
  </requestHandler>

  <requestHandler name="/dataimport" class="solr.DataImportHandler">
    <lst name="defaults">
      <str name="config">tika-data-config.xml</str>
    </lst>
  </requestHandler>


  <updateRequestProcessorChain name="langid" default="true" onError = "skip">
     <processor  class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory"
       onError = "continue">
       <str name="langid.fl">text</str>
       <str name="langid.langField">language_s</str>
       <str name="langid.threshold">0.8</str>
       <str name="langid.fallback">en</str>
     </processor>
     <processor class="solr.LogUpdateProcessorFactory" onError = "skip"/>
     <processor class="solr.RunUpdateProcessorFactory" onError = "skip"/>
   </updateRequestProcessorChain>

<!-- The default high-performance update handler -->
  <updateHandler class="solr.DirectUpdateHandler2">

    <!-- Enables a transaction log, used for real-time get, durability, and
         and solr cloud replica recovery.  The log can grow as big as
         uncommitted changes to the index, so use of a hard autoCommit
         is recommended (see below).
         "dir" - the target directory for transaction logs, defaults to the
                solr data directory.   -->
    <updateLog>
      <str name="dir">${solr.ulog.dir:}</str>
    </updateLog>

  </updateHandler>

</config>

1 个答案:

答案 0 :(得分:0)

尝试编辑服务器/ etc / jetty.xml并调整requestHeaderSize:

    <Set name="requestHeaderSize"><Property 
name="solr.jetty.request.header.size" default="8192" /></Set>

大于文件限制的内容。