ElasticSearch中数值字段的神秘错误值

时间:2015-12-19 17:04:35

标签: elasticsearch mongoid

我花了最近两天调查这个令人费解的问题:我有一个带有自定义映射的索引,我在其上执行一些聚合。问题是,在数字字段的聚合结果中,它返回的数值不会出现在导入数据的数据库中,即使结果数量相同。

我发现了类似的问题here,其中问题是跨越索引的字段映射不一致,但在我的情况下,它被映射为相同的类型。问题发生在字段:award.value.amountaward.value.x_amountEurtender.value.x_amountEur,据我所检查。这是我curl -XGET 'http://localhost:9200/documents/_mappings?pretty&human' 所述的当前映射(包含目标字段):

     {
      "documents" : {
        "mappings" : {
          "document" : {
            "properties" : {
              "additionalIdentifiers" : {
                "type" : "string",
                "index" : "not_analyzed"
              },
              "award" : {
                "type" : "nested",
                "properties" : {
                  "_id" : {
                    "properties" : {
                      "$oid" : {
                        "type" : "string"
                      }
                    }
                  },
                  "contract_number" : {
                    "type" : "string",
                    "index" : "not_analyzed"
                  },
                  "date" : {
                    "type" : "nested",
                    "properties" : {
                      "_id" : {
                        "properties" : {
                          "$oid" : {
                            "type" : "string"
                          }
                        }
                      },
                      "x_day" : {
                        "type" : "integer"
                      },
                      "x_month" : {
                        "type" : "integer"
                      },
                      "x_year" : {
                        "type" : "integer"
                      }
                    }
                  },
                  "description" : {
                    "type" : "string"
                  },
                  "initialValue" : {
                    "type" : "nested",
                    "properties" : {
                      "_id" : {
                        "properties" : {
                          "$oid" : {
                            "type" : "string"
                          }
                        }
                      },
                      "amount" : {
                        "type" : "float"
                      },
                      "currency" : {
                        "type" : "string"
                      },
                      "x_vat" : {
                        "type" : "float"
                      }
                    }
                  },
                  "minValue" : {
                    "type" : "nested",
                    "properties" : {
                      "_id" : {
                        "properties" : {
                          "$oid" : {
                            "type" : "string"
                          }
                        }
                      },
                      "amount" : {
                        "type" : "float"
                      },
                      "x_amountEur" : {
                        "type" : "float"
                      }
                    }
                  },
                  "title" : {
                    "type" : "string"
                  },
                  "value" : {
                    "type" : "nested",
                    "properties" : {
                      "_id" : {
                        "properties" : {
                          "$oid" : {
                            "type" : "string"
                          }
                        }
                      },
                      "amount" : {
                        "type" : "float"
                      },
                      "currency" : {
                        "type" : "string"
                      },
                      "x_amountEur" : {
                        "type" : "float"
                      },
                      "x_vat" : {
                        "type" : "float"
                      },
                      "x_vatbool" : {
                        "type" : "boolean"
                      }
                    }
                  },
                  "x_initialValue" : {
                    "type" : "nested",
                    "properties" : {
                      "_id" : {
                        "properties" : {
                          "$oid" : {
                            "type" : "string"
                          }
                        }
                      },
                      "x_amountEur" : {
                        "type" : "float"
                      },
                      "x_vatbool" : {
                        "type" : "boolean"
                      }
                    }
                  }
                }
              },
              "awardCriteria" : {
                "type" : "string"
              },
              "contract_number" : {
                "type" : "string"
              },
              "document_id" : {
                "type" : "string",
                "index" : "not_analyzed"
              },
              "numberOfTenderers" : {
                "type" : "string"
              },
              "procurementMethod" : {
                "type" : "string"
              },
              "procuring_entity" : {
                "type" : "nested",
                "properties" : {
                  "_id" : {
                    "properties" : {
                      "$oid" : {
                        "type" : "string"
                      }
                    }
                  },
                  "address" : {
                    "type" : "nested",
                    "properties" : {
                      "_id" : {
                        "properties" : {
                          "$oid" : {
                            "type" : "string"
                          }
                        }
                      },
                      "country" : {
                        "type" : "string"
                      },
                      "countryName" : {
                        "type" : "string",
                        "index" : "not_analyzed"
                      },
                      "email" : {
                        "type" : "string"
                      },
                      "locality" : {
                        "type" : "string"
                      },
                      "postalCode" : {
                        "type" : "string"
                      },
                      "streetAddress" : {
                        "type" : "string"
                      },
                      "telephone" : {
                        "type" : "string"
                      },
                      "x_url" : {
                        "type" : "string"
                      }
                    }
                  },
                  "name" : {
                    "type" : "string"
                  },
                  "x_slug" : {
                    "type" : "string",
                    "index" : "not_analyzed"
                  }
                }
              },
              "suppliers" : {
                "type" : "nested",
                "properties" : {
                  "_id" : {
                    "properties" : {
                      "$oid" : {
                        "type" : "string"
                      }
                    }
                  },
                  "address" : {
                    "properties" : {
                      "_id" : {
                        "properties" : {
                          "$oid" : {
                            "type" : "string"
                          }
                        }
                      },
                      "email" : {
                        "type" : "string"
                      },
                      "locality" : {
                        "type" : "string"
                      },
                      "postalCode" : {
                        "type" : "string"
                      },
                      "streetAddress" : {
                        "type" : "string"
                      },
                      "telephone" : {
                        "type" : "string"
                      },
                      "x_url" : {
                        "type" : "string"
                      }
                    }
                  },
                  "name" : {
                    "type" : "string"
                  },
                  "x_slug" : {
                    "type" : "string",
                    "index" : "not_analyzed"
                  }
                }
              },
              "tender" : {
                "type" : "nested",
                "properties" : {
                  "_id" : {
                    "properties" : {
                      "$oid" : {
                        "type" : "string"
                      }
                    }
                  },
                  "value" : {
                    "type" : "nested",
                    "properties" : {
                      "_id" : {
                        "properties" : {
                          "$oid" : {
                            "type" : "string"
                          }
                        }
                      },
                      "amount" : {
                        "type" : "float"
                      },
                      "currency" : {
                        "type" : "string"
                      },
                      "x_amountEur" : {
                        "type" : "float"
                      },
                      "x_vat" : {
                        "type" : "float"
                      },
                      "x_vatbool" : {
                        "type" : "boolean"
                      }
                    }
                  }
                }
              }  

这是我用来获取每对供应商之间合约价值的汇总 - procuring_entity:

    Document.es.search({
      "search_type": "count" ,
      "body":{
    "aggregations": {
        "entities":{
          "nested": {
            "path": "procuring_entity"
          },
          "aggs": {
            "procuring_entity_names": {
              "terms": {
                "field": "procuring_entity.x_slug",
                "size": 0
              },
              "aggs": {
                "suppliers": {
                  "nested": {
                    "path": "suppliers"
                  },
                  "aggs": {
                    "suppliers_names": {
                      "terms":{
                        "field": "suppliers.x_slug",
                        "size": 0
                      },
                      "aggs": {
                        "awards": {
                          "nested": {
                            "path": "award.value"
                          },
                          "aggs": {
                            "award_amounts": {
                              "terms":{
                                "field": "award.value.x_amountEur",
                                "size": 0
                              }
                            }
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }})

类型float的结果是:

    {"entities"=>
     {"doc_count"=>24300,
      "procuring_entity_names"=>
       {"doc_count_error_upper_bound"=>0,
        "sum_other_doc_count"=>0,
        "buckets"=>
         [{"key"=>"vsia-bernu-kliniska-universitates-slimnica",
           "doc_count"=>1360,
           "suppliers"=>
            {"doc_count"=>1360,
             "suppliers_names"=>
              {"doc_count_error_upper_bound"=>0,
               "sum_other_doc_count"=>0,
               "buckets"=>
                [{"key"=>"recipe-plus-as",
                  "doc_count"=>388,
                  "awards"=>
                   {"doc_count"=>388,
                    "awards"=>
                     {"doc_count_error_upper_bound"=>0,
                      "sum_other_doc_count"=>0,
                      "buckets"=>
                       [{"key"=>3679.086669921875, "doc_count"=>373},
                        {"key"=>0.0, "doc_count"=>12},
                        {"key"=>73610.3203125, "doc_count"=>1},
                        {"key"=>244000.0, "doc_count"=>1},
                        {"key"=>342348.9375, "doc_count"=>1}]}}}

问题是在MongoDB中,同样的查询返回388个文档,这些文档都有award.value.x_amountEur = 3679.08661250056,如Mongoid查询所示:

    Document.where(:"procuring_entity.x_slug" => "vsia-bernu-kliniska-universitates-slimnica")
            .keep_if{|doc| doc.suppliers.first.x_slug == "recipe-plus-as"}
            .map{|doc| doc.award.value.x_amountEur}.uniq 
    =>[3679.08661250056]

直接进入MongoDB的查询返回相同的内容。 我还尝试将目标字段映射为double,结果相同,而long则返回以下内容(更不正确的结果):

   {"entities"=> 
     {"doc_count"=>24300, 
      "procuring_entity_names"=> 
       {"doc_count_error_upper_bound"=>0, 
        "sum_other_doc_count"=>0, 
        "buckets"=> 
         [{"key"=>"vsia-bernu-kliniska-universitates-slimnica", 
           "doc_count"=>1360, 
           "suppliers"=> 
            {"doc_count"=>1360, 
             "suppliers_names"=> 
              {"doc_count_error_upper_bound"=>0, 
               "sum_other_doc_count"=>0, 
               "buckets"=> 
                [{"key"=>"recipe-plus-as", 
                  "doc_count"=>388, 
                  "awards"=> 
                   {"doc_count"=>388, 
                    "awards"=> 
                     {"doc_count_error_upper_bound"=>0, 
                      "sum_other_doc_count"=>0, 
                      "buckets"=> 
                       [{"key"=>3679, "doc_count"=>371}, 
                        {"key"=>0, "doc_count"=>12}, 
                        {"key"=>44300, "doc_count"=>1}, 
                        {"key"=>80472, "doc_count"=>1}, 
                        {"key"=>331636, "doc_count"=>1}, 
                        {"key"=>342348, "doc_count"=>1}, 
                        {"key"=>1658805, "doc_count"=>1}]}}}

我正在使用Elasticsearch 2.0,mongoid 5.0.1和mongoid-elasticsearch进行索引。我想不出任何其他事情,所以任何建议都受到欢迎和赞赏。

1 个答案:

答案 0 :(得分:2)

我尝试使用ES 2.0测试您的场景,并且有一些我不知道的东西。我无法为award.value.x_amountEur创建存储桶,除非我使用reverse_nested聚合来"退出"从一个嵌套的路径进入另一个。

所以,我没有使用awards聚合,而是使用相同的聚合,但是"包裹"在reverse_nested聚合中:

  "aggs": {
    "getting_back": {
      "reverse_nested": {},
      "aggs": {
        "awards": {
          "nested": {
            "path": "award.value"
          },
          "aggs": {
            "award_amounts": {
              "terms": {
                "field": "award.value.x_amountEur"
              }
            }
          }
        }
      }
    }
  }

对于这个我看到的东西还不错。

稍后修改:遵循我的更为一般的@Val's建议,完整的解决方案是在reverse_nestedawards使用suppliers聚合