WordDocument未在弹性搜索中索引为附件类型

时间:2018-06-20 07:15:12

标签: java elasticsearch kibana elastic-stack

正在尝试为我的d:\驱动器中的word文档创建索引。我已经安装了ingest-attachment plugin,现在正在尝试将我的Word文档转换为base64后建立索引,并能够使用下面的Java代码对其进行索引。

但是,问题是,它没有被索引为attachment。在弹性搜索中将其索引为普通文档。为了使我的Word文档被索引为attachment,我需要做哪些确切的更改。

public static void main(String args[]) throws IOException {
    String filePath = "D:\\\\1SearchEngine\\Karthikeyan A S_10641516.docx";
    String encodedfile = null;
    RestHighLevelClient restHighLevelClient = null;
    File file = new File(filePath);
    try {
        FileInputStream fileInputStreamReader = new FileInputStream(file);
        byte[] bytes = new byte[(int) file.length()];
        fileInputStreamReader.read(bytes);
        encodedfile = new String(Base64.getEncoder().encodeToString(bytes));
        System.out.println(encodedfile);
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    }

    try {
        if (restHighLevelClient != null) {
            restHighLevelClient.close();
        }
    } catch (final Exception e) {
        System.out.println("Error closing ElasticSearch client: ");
    }

    try {
        restHighLevelClient = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http"),
                new HttpHost("localhost", 9201, "http")));
    } catch (Exception e) {
        System.out.println(e.getMessage());
    }


    Map<String, Object> jsonMap = new HashMap<>();
    jsonMap.put("Name", "Karthikeyan");
    jsonMap.put("postDate", new Date());
    jsonMap.put("resume", encodedfile);

    IndexRequest request = new IndexRequest("posts", "doc", "1")
            .source(jsonMap);

    try {
        IndexResponse response = restHighLevelClient.index(request);
    } catch(ElasticsearchException e) {
        if (e.status() == RestStatus.CONFLICT) {

        }
    }
}

我正在使用ElasticSearch 6.2.3版本,并且正在使用RestHighLevelClient与ElasticSearch服务器通信。

我正在尝试使用Java api以以下格式编制索引。

    {
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "attach",
        "_type": "profile",
        "_id": "101",
        "_score": 1,
        "_source": {
          "resume": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=",
          "attachment": {
            "content_type": "application/rtf",
            "language": "ro",
            "content": "Lorem ipsum dolor sit amet",
            "content_length": 28
          }
        }
      }
    ]
  }
}

我已使用kibana查询对该文档建立了索引。

PUT _ingest/pipeline/attach
{
  "description" : "Extract attachment information",
  "processors" : [
    {
      "attachment" : {
        "field" : "resume"
      }
    }
  ]
}
PUT attach/profile/101?pipeline=attach
{
  "resume": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0="
}

0 个答案:

没有答案