索引附件文件到弹性搜索

时间:2011-08-18 17:43:38

标签: indexing attachment elasticsearch

我输入此命令来索引Elasticsearch中的文档

创建索引

curl -X PUT "localhost:9200/test_idx_1x"

创建映射

curl -X PUT "localhost:9200/test_idx_1x/test_mapping_1x/_mapping" -d '{
  "test_mapping_1x": {
    "properties": {
      "my_attachments": {
        "type": "attachment"
      }
    }
  }
}'

索引此文档

curl -XPUT 'http://localhost:9200/test_idx_1x/test_mapping_1x/4' -d '{
  "post_date": "2009-11-15T14:12:12",
  "message": "test Elastic Search",
  "name": "N1"
}'

所有这三个命令都是非常的商品。 但是当我输入这个命令时:

curl -XPOST 'http://localhost:9200/test_idx_1x/test_mapping_1x/1' -d '{
  "post_date": "2009-11-15T14:12:12",
  "message": "trying out Elastic Search",
  "name": "N2",
  "my_attachments": {
    "type": "attachment",
    "_content_type": "text/plain",
    "file": "http://localhost:5984/my_test_couch_db_7/ID2/test.txt"
  }
}'

我收到此错误消息:

{
  "error": "NullPointerException[null]",
  "status": 500
}

我改成了;

curl -XPOST 'http://localhost:9200/test_idx_1x/test_mapping_1x/1bis' -d '{
  "post_date": "2009-11-15T14:12:12",
  "message": "trying out Elastic Search",
  "name": "N2",
  "my_attachments": {
    "type": "attachment",
    "_content_type": "text/plain",
    "_name": "/inf/bd/my_home_directory/test.txt"
  }
}'

curl -XPUT 'http://localhost:9200/test_idx_1x/test_mapping_1x/1' -d '{
  "post_date": "2009-11-15T14:12:12",
  "message": "trying out Elastic Search",
  "name": "N2",
  "my_attachments": {
    "file": "http://localhost:5984/my_test_couch_db_7/ID2/test.txt"
  }
}'

curl -XPUT 'http://localhost:9200/test_idx_1x/test_mapping_1x/1' -d '{
  "post_date": "2009-11-15T14:12:12",
  "message": "trying out Elastic Search",
  "name": "N2",
  "my_attachments": {
    "file": "http://localhost:5984/my_test_couch_db_7/ID2/test.txt",
    "_content_type": "text/plain"
  }
}'

输出是同样的错误。

我改变它

curl -XPUT 'http://localhost:9200/test_idx_1x/test_mapping_1x/1' -d '{
  "user": "kimchy",
  "post_date": "2009-11-15T14:12:12",
  "message": "trying out Elastic Search",
  "name": "N2",
  "my_attachments": {
    "file": "http://localhost:5984/my_test_couch_db_7/ID2/test.txt",
    "_content_type": "text/plain",
    "content": "... base64 encoded attachment ..."
  }
}'

错误是

{
  "error": "MapperParsingException[Failed to parse]; nested: JsonParseException[Failed to decode VALUE_STRING as base64 (MIME-NO-LINEFEEDS): Illegal character '.' (code 0x2e) in base64 content\n at [Source: [B@159b3; line: 1, column: 241]]; ",
  "status": 400
}

curl -XPUT 'http://localhost:9200/test_idx_1x/test_mapping_1x/1' -d '{
  "post_date": "2009-11-15T14:12:12",
  "message": "trying out Elastic Search",
  "name": "N2",
  "my_attachments": "http://localhost:5984/my_test_couch_db_7/ID2/test.txt"
}'

我收到此错误消息:

{
  "error": "MapperParsingException[Failed to parse]; nested: JsonParseException[Unexpected character ('h' (code 104)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')\n at [Source: [B@1ae9565; line: 1, column: 132]]; ",
  "status": 400
}

如果我输入

curl -XPUT 'http://localhost:9200/test_idx_1x/test_mapping_1x/1' -d '{
  "post_date": "2009-11-15T14:12:12",
  "message": "trying out Elastic Search",
  "name": "N2",
  "my_attachments": "http://localhost:5984/my_test_couch_db_7/ID2/test.txt"
}'

我收到错误。我能理解

{
  "error": "MapperParsingException[Failed to parse]; nested: JsonParseException[Failed to decode VALUE_STRING as base64 (MIME-NO-LINEFEEDS): Illegal character ':' (code 0x3a) in base64 content\n at [Source: [B@1ffb7d4; line: 1, column: 137]]; ",
  "status": 400
}

如何将附件文件用于ES以便ES可以将其编入索引?


感谢您的回答。我输入这些命令时已经安装的附件插件。文本文件的内容在Base64中编码,所以我不再编码了。如果我不使用文件的路径,但直接在Base 64中使用其内容,例如

curl -XPUT 'http://localhost:9200/test_idx_1x/test_mapping_1x/' -d '{
  "post_date": "2009-11-15T14:12:12",
  "message": "trying out Elastic Search",
  "name": "N2",
  "my_attachments": "file's content string encoded in base64"
}'

一切都很好,我已经成功发布文件并稍后搜索其内容。

但如果我用路径的文件替换它,我获得了负面结果。所以我想知道如何在命令行中编写Base64文件,在ES索引的命令中(当然,我不想在键入第二个命令以在ES中索引它之前键入base64命令来编码文件)。作为你的答案,我是否必须安装类似“Perl库”的东西才能执行你的命令?

4 个答案:

答案 0 :(得分:4)

http://es-cn.medcl.net/tutorials/2011/07/18/attachment-type-in-action.html

#!/bin/sh

coded=`cat fn6742.pdf | perl -MMIME::Base64 -ne 'print encode_base64($_)'`
json="{\"file\":\"${coded}\"}"
echo "$json" > json.file
curl -X POST "localhost:9200/test/attachment/" -d @json.file

答案 1 :(得分:3)

首先,您没有指定是否安装了attachment插件。如果没有,你可以这样做:

./bin/plugin -install mapper-attachments

您需要重新启动ElasticSearch才能加载插件。

然后,如上所述,您将字段映射为类型attachment

curl -XPUT 'http://127.0.0.1:9200/foo/?pretty=1'  -d '
{
   "mappings" : {
      "doc" : {
         "properties" : {
            "file" : {
               "type" : "attachment"
            }
         }
      }
   }
}
'

当您尝试索引文档时,需要在Base64中对文件内容进行编码。您可以使用base64命令行实用程序在命令行上执行此操作。但是,要成为合法的JSON,您还需要对新行进行编码,您可以通过将base64的输出通过Perl进行管道来实现:

curl -XPOST 'http://127.0.0.1:9200/foo/doc?pretty=1'  -d '
{
   "file" : '`base64 /path/to/file | perl -pe 's/\n/\\n/g'`'
}
'

现在您可以搜索您的文件:

curl -XGET 'http://127.0.0.1:9200/foo/doc/_search?pretty=1'  -d '
{
   "query" : {
      "text" : {
         "file" : "text to look for"
      }
   }
}
'

有关详情,请参阅ElasticSearch attachment type

答案 2 :(得分:0)

这是一个完整的shell脚本实现:

file_path='/path/to/file'
file=$(base64 $file_path | perl -pe 's/\n/\\n/g')
curl -XPUT "http://eshost.com:9200/index/type/" -d '{
    "file" : "content" : "'$file'"
}'

答案 3 :(得分:0)

还有另一种解决方案 - http://elasticwarehouse.org的插件。您可以使用_ewupload?上传二进制文件,读取新生成的ID并使用此引用更新您的不同索引。

安装插件:

plugin -install elasticwarehouseplugin -u http://elasticwarehouse.org/elasticwarehouse/elasticsearch-elasticwarehouseplugin-1.2.2-1.7.0-with-dependencies.zip

重新启动群集,然后:

curl -XPOST "http://127.0.0.1:9200/_ewupload?folder=/myfolder&filename=mybinaryfile.bin" --data-binary @mybinaryfile.bin

示例回复:

{"id":"nWvrczBcSEywHRBBBwfy2g","version":1,"created":true}