Question

我以前使用的是现在已弃用的mapper-attachments插件，它与普通的索引一起使用起来相当容易。既然摄取附件已经取代它并需要管道等，那么如何正确使用它会让人感到困惑。

假设我有一个名为Media的模型，它有一个包含base64编码文件的file字段。我在该文件中有以下映射：

mapping '_source' => { :excludes => ['file'] } do
  indexes :id, type: :long, index: :not_analyzed
  indexes :name, type: :text
  indexes :visibility, type: :integer, index: :not_analyzed
  indexes :created_at, type: :date, include_in_all: false
  indexes :updated_at, type: :date, include_in_all: false

  # attachment specific mappings
  indexes 'attachment.title', type: :text, store: 'yes'
  indexes 'attachment.author', type: :text, store: 'yes'
  indexes 'attachment.name', type: :text, store: 'yes'
  indexes 'attachment.date', type: :date, store: 'yes'
  indexes 'attachment.content_type', type: :text, store: 'yes'
  indexes 'attachment.content_length', type: :integer, store: 'yes'
  indexes 'attachment.content', term_vector: 'with_positions_offsets', type: :text, store: 'yes'
end

我通过curl创建了一个附件管道：

curl -XPUT 'localhost:9200/_ingest/pipeline/attachment' -d'
{
  "description" : "Extract attachment information",
  "processors" : [
    {
      "attachment" : {
        "field" : "file"
      }
    }
  ]
}'

现在，以前简单的Media.last.__elasticsearch__.index_document足以通过file插件为记录和实际的mapper-attachments编制索引。

我不确定如何使用管道和ingest-attachment gem elasticsearch-rails来执行此操作。

我可以通过curl进行以下PUT：

curl -XPUT 'localhost:9200/assets/media/68?pipeline=attachment' -d'
{ "file" : "my_really_long_encoded_file_string" }'

这将对编码文件进行索引，但显然它不会索引模型数据的其余部分（如果先前已编入索引，则会完全覆盖它）。我真的不想在curl命令中包含每个模型属性和文件。有更好或更简单的方法吗？我只是完全关闭管道和摄取应该工作？

Answer 1

终于弄明白了。我需要更新ES宝石，特别是elasticsearch-api。

通过设置映射和管道，您可以轻松地执行：

Media.last.__elasticsearch__.index_document pipeline: :attachment

或

Media.last.__elasticsearch__.update_document pipeline: :attachment

这将正确地索引所有内容，并且您的文件将通过摄取管道正确解析和索引。

你如何使用elasticsearch-rails的ingest-attachment插件？

1 个答案: