从文档中搜索附件的最佳实践(超过2k个带有附件的文档)

时间:2018-06-24 07:44:18

标签: java elasticsearch elastic-stack

Am使用Java API从弹性搜索中获取索引文档。但是当Index具有更多的文档数量(如(2k +))时,弹性搜索就会得到Null作为响应。

如果索引的文档少于500个,则下面的Java API代码正常运行。

索引中的文档数量更多,造成了问题。 (获取时是否会出现性能问题?)

我使用摄取附件处理器插件进行附件,我在文档中附加了PDF。

但是,如果我使用带有curl脚本的kibana搜索相同的查询,则会得到响应,并且能够在Kibana中查看结果

请在下面找到我的Java代码

private final static String ATTACHMENT = "document_attachment";
private final static String TYPE = "doc";

public static void main(String args[])
{
    RestHighLevelClient restHighLevelClient = null;

    try {
        restHighLevelClient = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http"),
                new HttpHost("localhost", 9201, "http")));
    } catch (Exception e) {
        System.out.println(e.getMessage());
    }

    SearchRequest contentSearchRequest = new SearchRequest(ATTACHMENT); 
    SearchSourceBuilder contentSearchSourceBuilder = new SearchSourceBuilder();
    contentSearchRequest.types(TYPE);
    QueryStringQueryBuilder attachmentQB = new QueryStringQueryBuilder("Activa"); 
    attachmentQB.defaultField("attachment.content");
    contentSearchSourceBuilder.query(attachmentQB);
    contentSearchSourceBuilder.size(50);
    contentSearchRequest.source(contentSearchSourceBuilder);
    SearchResponse contentSearchResponse = null;

    try {
        contentSearchResponse = restHighLevelClient.search(contentSearchRequest); // returning null response
    } catch (IOException e) {
        e.getLocalizedMessage();
    }
    System.out.println("Request --->"+contentSearchRequest.toString());
    System.out.println("Response --->"+contentSearchResponse.toString());

    SearchHit[] contentSearchHits = contentSearchResponse.getHits().getHits();
    long contenttotalHits=contentSearchResponse.getHits().totalHits;
    System.out.println("condition Total Hits --->"+contenttotalHits);

请找到我在kibana中使用的脚本。正在获得以下脚本的响应。

GET document_attachment/_search?pretty
{
  "query" :{
      "match": {"attachment.content": "Activa"}
  }
}

请从Java API中找到以下搜索请求

SearchRequest{searchType=QUERY_THEN_FETCH, indices=[document_attachment], indicesOptions=IndicesOptions[id=38, ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false], types=[doc], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=128, source={"size":50,"query":{"match":{"attachment.content":{"query":"Activa","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}}}}

请找到我的地图详细信息

{
  "document_attachment": {
    "mappings": {
      "doc": {
        "properties": {
          "app_language": {
            "type": "text"
          },
          "attachment": {
            "properties": {
              "author": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              "content": {
                "type": "text",
                "analyzer": "custom_analyzer"
              },
              "content_length": {
                "type": "long"
              },
              "content_type": {
                "type": "text"
              },
              "date": {
                "type": "date"
              },
              "language": {
                "type": "text"
              },
              "title": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              }
            }
          },
          "catalog_description": {
            "type": "text"
          },
         "fileContent": {
            "type": "text"
          }
         }
        }
      }
    }
  }
}

请找到我的设置详细信息

PUT _ingest/pipeline/document_attachment
{
  "description" : "Extract attachment information",
  "processors" : [
    {
      "attachment" : {
        "field" : "fileContent"
      }
    }
  ]
}

只有在尝试基于attachment.content进行搜索时才会出现此错误,如果我使用其他字段进行搜索就可以得到结果。

我使用的是ElasticSearch 6.2.3版本

请在下面找到错误。

org.apache.http.ContentTooLongException: entity content is too long [105539255] for the configured buffer limit [104857600]
    at org.elasticsearch.client.HeapBufferedAsyncResponseConsumer.onEntityEnclosed(HeapBufferedAsyncResponseConsumer.java:76)
    at org.apache.http.nio.protocol.AbstractAsyncResponseConsumer.responseReceived(AbstractAsyncResponseConsumer.java:131)
    at org.apache.http.impl.nio.client.MainClientExec.responseReceived(MainClientExec.java:315)
    at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseReceived(DefaultClientExchangeHandlerImpl.java:147)
    at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.responseReceived(HttpAsyncRequestExecutor.java:303)
    at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:255)
    at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
    at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
    at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114)
    at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
    at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
    at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588)
    at java.lang.Thread.run(Thread.java:748)
Exception in thread "main" java.lang.NullPointerException
    at com.es.utility.DocumentSearch.main(DocumentSearch.java:88)

0 个答案:

没有答案