一个索引中的Elasticsearch字段出现在另一索引中

时间:2018-08-06 13:54:06

标签: elasticsearch logstash logstash-jdbc

我为Elasticsearch创建了一些索引。我为每个elaticsearch索引创建了单独的elasticsearch配置文件。我正在使用JDBC驱动程序从数据库的两个不同表中获取数据。更改索引之一的映射后重新启动logstash时,来自一个索引的字段开始出现在第二个索引上。

两个索引的配置如下所示

# file: contacts-index-logstash.conf
input {
    jdbc {
        jdbc_connection_string =>
        "jdbc:mysql://xxxx.com:3306/xxxx_engine?useSSL=false&autoReconnect=true&useUnicode=yes"
        jdbc_user => "email"
        jdbc_password => "xxxxxxxy"
        jdbc_validate_connection => true
        jdbc_paging_enabled => true
        jdbc_page_size => 500
        jdbc_driver_library => "/home/clodura/mysql-connector-java-5.1.46-bin.jar"
        jdbc_driver_class => "com.mysql.jdbc.Driver"
        schedule => "* * * * *"
        statement => "select c.id, c.name, c.description, c.industry, c.comp_size_range, c.specialities, ccd.industry_tags, ccd.social_tags, d.company_type, cw.website, d.menu, d.header, d.cleaned_page_text, cga.city, cga.state, cga.country from companies c left outer join company_calais_data ccd on c.id = ccd.company_id left outer join website_scraped_data d on c.id = d.company_id, company_websites cw, company_geocode_address cga where c.id = cw.company_id and c.id = cga.company_id and c.date_added > '2018-03-01'"
    }
}
output {
    elasticsearch {
#        protocol => http
        index => "clodura"
        document_type => "companies"
        document_id => "%{id}"
        hosts => ["localhost:9200"]
    }
}

这是第二个配置

# file: contacts-position-logstash.conf
input {
    jdbc {
        jdbc_connection_string =>
        "jdbc:mysql://xxxxxx.com:3306/xxxxxx_engine?useSSL=false&autoReconnect=true&useUnicode=yes"
        jdbc_user => "email"
        jdbc_password => "xxxxxxxy"
        jdbc_validate_connection => true
        jdbc_paging_enabled => true
        jdbc_page_size => 500
        jdbc_driver_library => "/home/clodura/mysql-connector-java-5.1.46-bin.jar"
        jdbc_driver_class => "com.mysql.jdbc.Driver"
        schedule => "* * * * *"
        statement => "select company_id, person_id, position from company_person"
    }
}
output {
    elasticsearch {
        index => "contactposition"
        document_type => "positions"
        document_id => "%{company_id}%{person_id}"
        hosts => ["localhost:9200"]
    }
}

一小时后的接触位置索引映射更改为

{
  "contactposition" : {
    "mappings" : {
      "positions" : {
        "properties" : {
          "@timestamp" : {
            "type" : "date",
            "format" : "strict_date_optional_time||epoch_millis"
          },
          "@version" : {
            "type" : "string"
          },
          "city" : {
            "type" : "string"
          },
          "cleaned_page_text" : {
            "type" : "string"
          },
          "comp_size_range" : {
            "type" : "string"
          },
          "company_id" : {
            "type" : "string",
            "index" : "not_analyzed"
          },
          "company_type" : {
            "type" : "string"
          },
          "country" : {
            "type" : "string"
          },
          "description" : {
            "type" : "string"
          },
          "header" : {
            "type" : "string"
          },
          "id" : {
            "type" : "string"
          },
          "industry" : {
            "type" : "string"
          },
          "industry_tags" : {
            "type" : "string"
          },
          "menu" : {
            "type" : "string"
          },
          "name" : {
            "type" : "string"
          },
          "person_id" : {
            "type" : "string",
            "index" : "not_analyzed"
          },
          "position" : {
            "type" : "string"
          },
          "social_tags" : {
            "type" : "string"
          },
          "specialities" : {
            "type" : "string"
          },
          "state" : {
            "type" : "string"
          },
          "website" : {
            "type" : "string"
          }
        }
      }
    }
  }
}

clodura 索引中的字段如何出现在 contactposition 索引中?请帮忙。

1 个答案:

答案 0 :(得分:2)

您需要在您的output中添加条件。 Logstash不会独立处理文件,这意味着您的输入将转到所有输出。

input {
    ...
    tags => ["contactposition"]
}

output {
  if "contactposition" in [tags] {
    elasticsearch {
        index => "contactposition"
        document_type => "positions"
        document_id => "%{company_id}%{person_id}"
        hosts => ["localhost:9200"]
    }
  }
}