弹性搜索映射循环结构

时间:2018-02-01 00:23:53

标签: elasticsearch mapping

我正在尝试对JSON数据进行一些映射,如下所示

    "0": {
    "ip": "147.135.210.114",
    "countryName": "United States",
    "countryCode": "US",
    "frequency": "46",
    "updown": -0.25579565572885,
    "viewCount": "28"
},
"1": {
    "ip": "171.255.199.129",
    "danger": "94.42262",
    "countryName": "Viet Nam",
    "countryCode": "VN",
    "frequency": "40",
    "updown": -0.088216630501414,
    "viewCount": null
},
"2": {
    "ip": "52.163.62.13",
    "danger": "94.18168",
    "countryName": "United States",
    "countryCode": "US",
    "frequency": "46",
    "updown": -0.016932485456378,
    "viewCount": "5"
},
"3": {
    "ip": "151.80.140.233",
    "danger": "93.77446",
    "countryName": "Unknown",
    "countryCode": "Unknown",
    "frequency": "46",
    "updown": -0.31354507272874,
    "viewCount": "10"
},

如你所见,有很多物体(我甚至不确定它是否被称为物体)。 该对象具有不同的名称(0,1,2,3,4 ...)和相同的元素(ip,danger ...)。

如何一次映射具有不同名称的元素?它甚至可能吗?

感谢advnace :-)

1 个答案:

答案 0 :(得分:0)

You can map elements with different names with same mapping, but it is unlikely to be useful (depends on your use case).

There are mainly two options:

  • using dynamic_templates to describe data with repeated structure;
  • reshaping your data and using nested data type

dynamic_templates

Pros:

  • does the job
  • no need to change the data

Cons:

  • it is harder or impossible to query
  • risk of mapping explosion

reshaping + nested

Pros:

  • does the job
  • no restriction on querying abilities

Cons:

  • have to change the data format

Below come the details.

Using dynamic mapping with dynamic_templates

In general, use of dynamic mapping lets you add objects of arbitrary structure, whose types will be guessed on the fly according to some predefined rules.

Dynamic mapping gives you less control over your data structures, and may lead to a mapping explosion.

If you are fine with dynamic mapping and you just want to tweak the way it is applied, you may use dynamic_templates.

You would have to create a mapping before inserting any data, that might look like this:

PUT dyno
{
  "mappings": {
    "dyno": {
      "dynamic_templates": [
        {
          "dynoField": {
            "path_match": "myPath.*",
            "mapping": {
              "properties": {
                "ip": {
                  "type": "keyword"
                },
                "danger": {
                  "type": "float"
                },
                "countryName": {
                  "type": "keyword"
                },
                "countryCode": {
                  "type": "keyword"
                },
                "frequency": {
                  "type": "integer"
                },
                "updown": {
                  "type": "float"
                },
                "viewCount": {
                  "type": "integer"
                }
              }
            }
          }
        }
      ],
      "properties": {
        "myPath": {
          "type": "object"
        }
      }
    }
  }
}

Here the dynamic_templates section is applied to all keys of myPath object (0,1,2,3,4 and so on).

After indexing the first document, the mapping you get from ES will actually look like:

{
  "dyno": {
    "aliases": {},
    "mappings": {
      "dyno": {
        "dynamic_templates": [
          ...
        ],
        "properties": {
          "myPath": {
            "properties": {
              "0": {
                "properties": {
                  "countryCode": {
                    "type": "keyword"
                  },
                  "countryName": {
                    "type": "keyword"
                  },
                  "danger": {
                    "type": "float"
                  },
                  "frequency": {
                    "type": "integer"
                  },
                  "ip": {
                    "type": "keyword"
                  },
                  "updown": {
                    "type": "float"
                  },
                  "viewCount": {
                    "type": "integer"
                  }
                }
              },
              "1": {
                "properties": {
                  "countryCode": {
                    "type": "keyword"
                  }, ...
              },
              "2": {
                "properties": {
                  "countryCode": {
                    "type": "keyword"
                  }, ...
              },
              "3": {
                "properties": {
                  "countryCode": {
                    "type": "keyword"
                  }, ...
                }
              }
            }
          }
        }
      }
    }
  }
}

Note that for every different "name" of the object ES created a separate portion of the mapping.

Although this does exactly what you asked for, it might not be useful. If you have a use case where you would like to select all documents with certain countryCode you might build a query like this:

POST dyno/dyno/_search
{
  "query": {
    "match": {
      "myPath.1.countryCode": "VN"
    }
  }
}

But this will only return matches with objects with "name" 1. If the set of "names" is big and/or is not known in advance, it will be very complex to query them all.

Reshaping your data and using nested type

To be able to search on any object's countryCode you may want to reshape your data in the following way:

{
  "myPath": [
    {
      "__name": "0",
      "ip": "147.135.210.114",
      "countryName": "United States",
      "countryCode": "US",
      "frequency": "46",
      "updown": -0.25579565572885,
      "viewCount": "28"
    },
    {
      "__name": "1",
      "ip": "171.255.199.129",
      "danger": "94.42262",
      "countryName": "Viet Nam",
      "countryCode": "VN",
      "frequency": "40",
      "updown": -0.088216630501414,
      "viewCount": null
    }, ...
  ]
}

The tricky part is that, in order to be able to search the items of myPath list separately, one must use nested datatype.

In this case your mapping can be dynamic or strict, whatever fits you most. The mapping in dynamic case might look as simple as this:

PUT dyno
{
  "mappings": {
    "dyno": {
      "properties": {
        "myPath": {
          "type": "nested"
        }
      }
    }
  }
}

Which will allow you to do a query like this:

POST dyno/dyno/_search
{
  "query": {
    "nested": {
      "path": "myPath",
      "query": {
        "match": {
          "myPath.countryCode.keyword": "VN"
        }
      }
    }
  }
}

Note that in order to query a nested field you have to use nested query. A nice place to get started with nested data type is this chapter of the guide.

Final note about the nested field

In fact, for a nested field ES indexes an implicit document for each item in the list, so indexing and searching this type of objects is slower than non-nested one.

You may consider splitting these objects with different "names" as separate ES documents.


Hope that helps!