如何使用带有嵌套的嵌套(弹性搜索映射)

时间:2019-04-24 06:59:51

标签: json python-3.x elasticsearch relationship

我想用Elastic Search Mapping映射以下json:

JSON:

{"user_id":{
    "data_flow_id_1":[
        {"file_location": "C:/ewew","timestamp": "2019-01-01T00:00:00"},
        {"file_location": "C:/ewew2", "timestamp": "2019-02-01T00:00:00"}
            ],

    "data_flow_id_2":[
        {"file_location": "C:/ewew3","timestamp": "2019-03-01T00:00:00"},
        {"file_location": "C:/ewew4", "timestamp": "2019-04-01T00:00:00"}
            ]
}}

因此,“ user_id”“拥有”多个具有自己位置的dataflow_id。 到目前为止,我已经掌握了它,但是它并不能完全模拟json描述的内容-

ES映射:

{
  "mappings": {
    "properties": {
      "dataflow_type": {
        "type": "nested",
          "properties": {
              "user_id": {"type": "string"},
              "data_flow_id": {"type": "string"},
              "file_location": {"type":"string"},
              "timestamp": {"type":"date"}
          }
      }
    }
  }
}

我正在努力在user_id中嵌套dataflow_id_ *位-我是否需要在另一个嵌套中嵌套?

更新: 像这样的东西?

{
  "mappings": {
    "properties": {
      "user_id": {
        "type": "nested",
          "properties": {
              "data_flow_id":{
                 "type": "nested",
                    "properties": 
                    {       "file_location": {"type": "text"},
                            "timestamp": {"type":"date"}
                    }
          }
      }
     }
    }
  }
}

1 个答案:

答案 0 :(得分:1)

我建议您使用下面的映射,以避免过多的嵌套。

PUT myindex
{
  "mappings": {
    "properties": {
      "user_id": {
        "type": "keyword"
      },
      "data_flow_id": {
        "type": "keyword"
      },
      "file_location": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "timestamp": {
        "type": "date"
      }
    }
  }
}

然后必须按如下所示索引文档

PUT myindex/_doc/1
{
  "user_id": "some_id",
  "data_flow_id": "data_flow_id_1",
  "file_location": "C:/ewew",
  "timestamp": "2019-01-01T00:00:00"
}

类似地,其他文档也可以添加为:

PUT myindex/_doc/2
{"user_id":"some_id","data_flow_id":"data_flow_id_1","file_location":"C:/ewew2","timestamp":"2019-02-01T00:00:00"}

PUT myindex/_doc/3
{"user_id":"some_id","data_flow_id":"data_flow_id_2","file_location":"C:/ewew3","timestamp":"2019-03-01T00:00:00"}

PUT myindex/_doc/4
{"user_id":"some_id","data_flow_id":"data_flow_id_2","file_location":"C:/ewew4","timestamp":"2019-04-01T00:00:00"}

上述方法的缺点是,您必须为问题中提到的JSON索引4个文档,而不是2个文档。但这将使搜索查询变得简单。另一方面,嵌套会导致复杂的查询。

示例查询以获取data_flow_iddata_flow_id_1的文档

POST myindex/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "data_flow_id": "data_flow_id_1"
          }
        }
      ]
    }
  }
}