Elasticsearch mongo-connector KeyError _id

时间:2016-04-08 19:04:39

标签: mongodb elasticsearch

我正在使用mongo-connector将mongoDB replicaSet中的数据与elast2-doc-manager同步为Doc Manager。

我正在运行mongo-connector:

$mongo-connector --auto-commit-interval=5 --verbose -m 127.0.0.1:27017 -t localhost:9200 -d elastic2_doc_manager --namespace-set=db.collection1,db.collection2 --fields=f1,f2,f3

在某些时候我得到了这个例外:

Traceback (most recent call last):  
  File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
    self.run()
  File "/usr/local/lib/python2.7/dist-packages/mongo_connector/util.py", line 85, in wrapped
    func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/mongo_connector/oplog_manager.py", line 261, in run
    docman.upsert(doc, ns, timestamp)
  File "/usr/local/lib/python2.7/dist-packages/mongo_connector/util.py", line 32, in wrapped
    return f(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/mongo_connector/doc_managers/elastic2_doc_manager.py", line 150, in upsert
    doc_id = u(doc.pop("_id"))

我添加了一个try / except包装方法File "/usr/local/lib/python2.7/dist-packages/mongo_connector/doc_managers/elastic2_doc_manager.py" Line 148,以便在出现异常时打印有问题的文档。

不知何故,印刷文档中缺少_id。但是,如果我直接从交互式cmd查询mongo,我可以获取相同的文档,并且_id键存在。

所以我不知道为什么mongo-connector/elastic2_doc_manager没有看到某些文档的_id属性。

2 个答案:

答案 0 :(得分:0)

Mongo-connector,无论出于何种原因,似乎都会从您的文档中删除_id。然而,来自mongodb的ObjectId的字符串表示将被存储为elasticsearch中的_id。它仍然存在但不在文档中,或者弹性搜索会将其称为“源”。

查看查询结果,它的结构如下:

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "hits" : {
    "total" : 135513,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "myIndex",
      "_type" : "myType",
      "_id" : "5294b93e6c255bb82d0000c0", <-- ID from mongodb
      "_score" : 1.0,
      "_source":{
        "some": "data",
        "my": "document"
      },
      {
      "_index" : "myIndex",
      "_type" : "myType",
      "_id" : "5294b93e6c255bb82d0000de", <-- ID from mongodb
      "_score" : 1.0,
      "_source":{
        "some": "data2",
        "my": "document2"
      }
    }]
  }
}

我的印象是mongo-connector故意这样做。要仅将_id存储在相应的ES字段中,但我也没有理由同时从文档的_source中删除_id。但是我注意到在使用elastic_doc_manager(v1)时ES中的文档缺少id。

答案 1 :(得分:0)

运行mongo-connector -c config.json 这里是config.json的示例文件,您可以正确配置_id。

并在.json文件中定义 -

"docManagers": [
    {
        "docManager": "elastic2_doc_manager",
        "__targetURL": "localhost:9200",
        "bulkSize": 5000,
        "uniqueKey": "_id",
        "__autoCommitInterval": null,
        "args": {
        "aws": {
            "region_name": "your-choice"
    }
  }