Question

我正在使用Logstash通过API保持我的Elasticsearch与HBase同步。

这是我的配置文件：

input {
  elasticsearch {
    hosts => ["<elasticsearch_ip>"]
    index => "<some_name>"
    type => "<some_name>"
    query => '{ "query": {
      "bool": {
        "must_not": [
          {"term": {"synced": true}}
        ]
      }
    } }'
  }
}

filter {
  mutate {
    add_field => { "synced" => true }
  }
}

output {
  if [type] == "<some_name>" {
    http {
      format=>"json"
      http_method=>"post"
      url=>"http://<api-ip>/<endpoint>"
    }
    elasticsearch {
      hosts => [<elasticsearch-ip>]
      action => "update"
      index => "<some_name>"
      document_type => "<some_name>"
      document_id => "%{document_id}"
    }
  }
}

我想在文档中添加synced字段，这样我就不会在HBase中对它们进行两次索引。问题是%{document_id}未转换为文档的实际_id。我认为没有这样的字段，因为我尝试使用add_field => { "document_id" => "%{document_id}" }将其添加到文档正文中并且它没有被转换。我也试过%{_id}和%{id}，但没有运气。我做错了什么？

注意：我听说过Watcher吗？好吧，当然，我实际上是先用它来实现的。但你听说过它的价格吗？

Answer 1

您需要在USE [incentive] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE PROCEDURE [dbo].[usp_insert_empincentivefinal] (@id int, @ConsultantName varchar(50) , @ClientName varchar(50) , @StartDate varchar(50), @PositionName varchar(20) , @Location varchar(20) , @Job_Status varchar (20), @RecruiterName varchar(20), @BenchMarketing varchar(1) , @Placement varchar(1), @CompanyName varchar(20), @Durations varchar(20), @DurationofProject varchar(10) --@out int out ) AS BEGIN SET NOCOUNT ON BEGIN TRAN INSERT INTO [tbl_Empincentivenew1](ConsultantName, ClientName, RecruiterName, PositionName, CompanyName, Location, DurationofProject, Durations, BenchMarketing, Placement, Job_Status, StartDate) OUTPUT INSERTED.id DEFAULT VALUES COMMIT END GO输入

中将docinfo flag设置为true

elasticsearch

然后，您可以使用elasticsearch { hosts => ["<elasticsearch_ip>"] index => "<some_name>" type => "<some_name>" docinfo => true query => '{ "query": { "bool": { "must_not": [ {"term": {"synced": true}} ] } } }' }

访问输出中的文档ID

[@metadata][_id]

Logstash保持两个数据库同步 - 无法访问％{document_id}

1 个答案: