来自Mongo的ElasticSearch河搞乱了字段映射

时间:2014-07-09 19:59:20

标签: mongodb elasticsearch elasticsearch-mongo-river

我使用Mongo,Elastic Search和这个河流插件:https://github.com/richardwilly98/elasticsearch-river-mongodb

我已经成功地设置了所有内容,因为当Mongo更新时河流保持ES数据更新,但河流直接将Mongo文档中的所有属性复制到ES中,但我只想要一个小的子集那些记录。例如。如果一个Mongo doc有30个属性,那么所有这些属性都被放入ES而不是我想要的5个。我假设问题在于映射,并且我已经跟随了几个文档和另一个Stack Overflow线程(curl -X POST -d @mapping.json + mapping not created),但它仍然不适合我。这就是我正在做的事情:

我用:

创建我的索引
curl -XPOST "http://localhost:9200/mongoindex" -d @index.json

index.json:

{
  "settings" : {
      "number_of_shards" : 1
  },
  "analysis" : {
    "analyzer" : {
      "str_search_analyzer" : {
        "tokenizer" : "keyword",
        "filter" : ["lowercase"]
      },
      "str_index_analyzer" : {
         "tokenizer" : "keyword",
         "filter" : ["lowercase", "ngram"]
      }
    },
    "filter" : {
      "ngram" : {
        "type" : "ngram",
        "min_gram" : 2,
        "max_gram" : 20
      }
    }
  }
}

然后跑步:

curl -XPOST "http://localhost:9200/mongoindex/listing/_mapping" -d @mapping.json

有了这些数据:

{
   "listing":{
      "properties":{
        "_all": {
          "enabled": false
        },
        "title": {
          "type": "string",
          "store": false,
          "index": "not_analyzed"
        },
        "bathrooms": {
          "type": "integer",
          "store": true,
          "index": "analyzed"
        },
        "bedrooms": {
          "type": "integer",
          "store": true,
          "index": "analyzed"
        },
        "address": {
          "type": "nested",
          "include_in_parent": true,
          "store": true,
            "properties": {
              "counrty": {
                "type":"string"
              },
              "city": {
                "type":"string"
              },
              "stateOrProvince": {
                "type":"string"
              },
              "fullStreetAddress": {
                "type":"string"
              },
              "postalCode": {
                "type":"string"
              }
            }
        },
        "location": {
          "type": "geo_point",
          "full_name": "geometry.coordiantes",
          "store": true
        }
      }
   }
}

然后最终创建了河流:

curl -XPUT "http://localhost:9200/_river/mongoindex/_meta" -d @river.json

river.json:

{
  "type": "mongodb",
  "mongodb": {
    "db": "blueprint",
    "collection": "Listing",
    "options": {
      "secondary_read_preference": true,
      "drop_collection": true
    }
  },
  "index": {
    "name": "mongoindex",
    "type": "listing"
  }
}

毕竟那条河在那个ES上工作了,但它现在是Mongo的逐字副本,我需要修改映射,但它只是没有生效。我错过了什么?

这就是我在河流运行后的映射......就像我希望它看起来一样。

ES mapping

enter image description here

2 个答案:

答案 0 :(得分:0)

我会将动态映射设置为false:

  

可以完全动态创建未映射类型的映射   通过将index.mapper.dynamic设置为false来禁用。

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-dynamic-mapping.html

其他人也有类似的问题,到目前为止看起来最好的解决方案是防止MongoDB River动态映射:

https://github.com/richardwilly98/elasticsearch-river-mongodb/issues/75

答案 1 :(得分:0)

原来问题是动态属性被排除在映射配置之外。它应该在2个位置,如上所示在index.json上,以及在mappings.json中:

{
   "listing":{
      "_source": {
        "enabled": false
      },
      "dynamic": false,      // <--- Need to add this
      "properties":{
        "_all": {
          "enabled": false
        },
        "title": {
          "type": "string",
          "store": false,
          "index": "str_index_analyzer"
        },
        "bathrooms": {
          "type": "integer",
          "store": true,
          "index": "analyzed"
        },
        "bedrooms": {
          "type": "integer",
          "store": true,
          "index": "analyzed"
        },
        "address": {
          "type": "nested",
          "include_in_parent": true,
          "store": true,
            "properties": {
              "counrty": {
                "type":"string",
                "index": "str_index_analyzer"
              },
              "city": {
                "type":"string",
                "index": "str_index_analyzer"
              },
              "stateOrProvince": {
                "type":"string",
                "index": "str_index_analyzer"
              },
              "fullStreetAddress": {
                "type":"string",
                "index": "str_index_analyzer"
              },
              "postalCode": {
                "type":"string"
              }
            }
        },
        "location": {
          "type": "geo_point",
          "full_name": "geometry.coordiantes",
          "store": true
        }
      }
   }
}

902 docs vs 451,我认为这是我用来浏览文档的ElasticSearch Head插件中的一个错误。它没有重复,但有几个点显示902个文档作为各种摘要。