正在尝试为我的d:\
驱动器中的word文档创建索引。我已经安装了ingest-attachment plugin
,现在正在尝试将我的Word文档转换为base64后建立索引,并能够使用下面的Java代码对其进行索引。
但是,问题是,它没有被索引为attachment
。在弹性搜索中将其索引为普通文档。为了使我的Word文档被索引为attachment
,我需要做哪些确切的更改。
public static void main(String args[]) throws IOException {
String filePath = "D:\\\\1SearchEngine\\Karthikeyan A S_10641516.docx";
String encodedfile = null;
RestHighLevelClient restHighLevelClient = null;
File file = new File(filePath);
try {
FileInputStream fileInputStreamReader = new FileInputStream(file);
byte[] bytes = new byte[(int) file.length()];
fileInputStreamReader.read(bytes);
encodedfile = new String(Base64.getEncoder().encodeToString(bytes));
System.out.println(encodedfile);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
try {
if (restHighLevelClient != null) {
restHighLevelClient.close();
}
} catch (final Exception e) {
System.out.println("Error closing ElasticSearch client: ");
}
try {
restHighLevelClient = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http"),
new HttpHost("localhost", 9201, "http")));
} catch (Exception e) {
System.out.println(e.getMessage());
}
Map<String, Object> jsonMap = new HashMap<>();
jsonMap.put("Name", "Karthikeyan");
jsonMap.put("postDate", new Date());
jsonMap.put("resume", encodedfile);
IndexRequest request = new IndexRequest("posts", "doc", "1")
.source(jsonMap);
try {
IndexResponse response = restHighLevelClient.index(request);
} catch(ElasticsearchException e) {
if (e.status() == RestStatus.CONFLICT) {
}
}
}
我正在使用ElasticSearch 6.2.3版本,并且正在使用RestHighLevelClient
与ElasticSearch服务器通信。
我正在尝试使用Java api以以下格式编制索引。
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "attach",
"_type": "profile",
"_id": "101",
"_score": 1,
"_source": {
"resume": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=",
"attachment": {
"content_type": "application/rtf",
"language": "ro",
"content": "Lorem ipsum dolor sit amet",
"content_length": 28
}
}
}
]
}
}
我已使用kibana查询对该文档建立了索引。
PUT _ingest/pipeline/attach
{
"description" : "Extract attachment information",
"processors" : [
{
"attachment" : {
"field" : "resume"
}
}
]
}
PUT attach/profile/101?pipeline=attach
{
"resume": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0="
}