Question

我正在尝试使用此字段

<fieldType name="json" class="solr.TextField"
           positionIncrementGap="100">
</fieldType>
<field name='json_field' type="json" indexed="true" stored="true"
      omitNorms="false" required="false" multiValued="false"/>

我必须在我的复杂函数中获得这个json字段的值

FunctionValues json =this.json_field.getValues(context, readerContext);

然后我尝试获取整个json

json.strVal(doc)

但是，我只获得了部分代币。

如果我尝试使用

<analyzer>
    <tokenizer class="solr.KeywordTokenizerFactory"/>
</analyzer>

我收到错误

＆＃34; SOLR不接受大于32766的令牌＆＃34;

Answer 1

原因基本上是，如果你使用KeywordTokenizer作为大文本，它会尝试创建一个大的令牌，这显然是有限的。

/**
   * Absolute hard maximum length for a term, in bytes once
   * encoded as UTF8.  If a term arrives from the analyzer
   * longer than this length, an
   * <code>IllegalArgumentException</code>  is thrown
   * and a message is printed to infoStream, if set (see {@link
   * IndexWriterConfig#setInfoStream(InfoStream)}).
   */
  public final static int MAX_TERM_LENGTH = DocumentsWriterPerThread.MAX_TERM_LENGTH_UTF8;

没有办法获得完整的json，而是保存为未分析，例如设置indexed=false，但是你将无法搜索这个json，它只会按原样存储。这真的是你需要的吗？

有没有办法在没有标记化的情况下在solr中存储大型JSON并在复杂的solr函数中获取它？

1 个答案: