查询嵌套对象时出现奇怪的结果

时间:2016-09-16 08:33:29

标签: elasticsearch

Elasticsearch版本:2.3.3

已安装插件:无插件

JVM版:1.8.0_91

操作系统版本:Linux版本3.19.0-56-generic(Ubuntu 4.8.2-19ubuntu1)

当我在多条路径上查询nested objects时,我得到了奇怪的结果。我想用female搜索所有dementia。结果中有匹配的患者。但是,我也得到了其他不能找到的诊断,诊断与这些患者相关。

例如,尽管我只查看了dementia,但我也得到了以下诊断。

  • 精神障碍,未另行说明
  • 必需(原发性)高血压

为什么? 我希望{strong}仅 female dementia并且不想要其他诊断。

Client_Demographic_Details每名患者包含一份文件。 Diagnosis每位患者包含多个文档。最终目标是将我的整个数据从PostgreSQL DB(总共72个表,over 1600 columns)索引到Elasticsearch中。

查询:

{'query': {
       'bool': {
           'must': [
               {'nested': {
                   'path': 'Diagnosis',
                   'query': {
                       'bool': {
                           'must': [{'match_phrase': {'Diagnosis.Diagnosis': {'query': "dementia"}}}]
                       }  
                   }
               }},
               {'nested': {
                   'path': 'Client_Demographic_Details',
                   'query': {
                       'bool': {
                           'must': [{'match_phrase': {'Client_Demographic_Details.Gender_Description': {'query': "female"}}}]
                       }  
                   }
               }}
           ]
       }
    }}

结果:

{
  "hits": {
    "hits": [
      {
        "_score": 3.4594634, 
        "_type": "Patient", 
        "_id": "72", 
        "_source": {
          "Client_Demographic_Details": [
            {
              "Gender_Description": "Female", 
              "Patient_ID": 72, 
            }
          ], 
          "Diagnosis": [
            {
              "Diagnosis": "F00.0 -  Dementia in Alzheimer's disease with early onset", 
              "Patient_ID": 72, 
            }, 
            {
              "Patient_ID": 72, 
              "Diagnosis": "F99.X -  Mental disorder, not otherwise specified", 
            }, 
            {
              "Patient_ID": 72, 
              "Diagnosis": "I10.X -  Essential (primary) hypertension", 
            }
          ]
        }, 
        "_index": "denorm1"
      }
    ], 
    "total": 6, 
    "max_score": 3.4594634
  }, 
  "_shards": {
    "successful": 5, 
    "failed": 0, 
    "total": 5
  }, 
  "took": 8, 
  "timed_out": false
}

映射:

{
  "denorm1" : {
    "aliases" : { },
    "mappings" : {
      "Patient" : {
        "properties" : {
          "Client_Demographic_Details" : {
            "type" : "nested",
            "properties" : {
              "Patient_ID" : {
                "type" : "long"
              },
              "Gender_Description" : {
                "type" : "string"
              }
            }
          },
          "Diagnosis" : {
            "type" : "nested",
            "properties" : {
              "Patient_ID" : {
                "type" : "long"
              },
              "Diagnosis" : {
                "type" : "string"
              }
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1473974457603",
        "number_of_shards" : "5",
        "number_of_replicas" : "1",
        "uuid" : "Jo9cI4kRQjeWcZ7WMB6ZAw",
        "version" : {
          "created" : "2030399"
        }
      }
    },
    "warmers" : { }
  }
}

2 个答案:

答案 0 :(得分:1)

试试这个

{
  "_source": {
    "exclude": [
      "Client_Demographic_Details",
      "Diagnosis"
    ]
  },
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "Diagnosis",
            "query": {
              "bool": {
                "must": [
                  {
                    "match_phrase": {
                      "Diagnosis.Diagnosis": {
                        "query": "dementia"
                      }
                    }
                  }
                ]
              }
            },
            "inner_hits": {}
          }
        },
        {
          "nested": {
            "path": "Client_Demographic_Details",
            "query": {
              "bool": {
                "must": [
                  {
                    "match_phrase": {
                      "Client_Demographic_Details.Gender_Description": {
                        "query": "female"
                      }
                    }
                  }
                ]
              }
            },
            "inner_hits": {}
          }
        }
      ]
    }
  }
}

嵌套的匹配文档将在inner hits内,并在源代码中休息。 我知道这不是一个具体的方法

答案 1 :(得分:0)

正如@blackmamba建议的那样,我构建了以Client_Demographic_Details为根对象和Diagnosis作为嵌套对象的映射。

<强>映射:

{
  "denorm2" : {
    "aliases" : { },
    "mappings" : {
      "Patient" : {
        "properties" : {
          "BRC_ID" : {
            "type" : "long"
          },
          "Diagnosis" : {
            "type" : "nested",
            "properties" : {
              "BRC_ID" : {
                "type" : "long"
              },
              "Diagnosis" : {
                "type" : "string"
              }
            }
          },
          "Gender_Description" : {
            "type" : "string"
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1474031740689",
        "number_of_shards" : "5",
        "number_of_replicas" : "1",
        "uuid" : "fMeKa6sfThmxkg_281WdHA",
        "version" : {
          "created" : "2030399"
        }
      }
    },
    "warmers" : { }
  }
} 

<强>查询:

我添加了源过滤并突出显示。

{
'_source': {
    'exclude': ['Diagnosis'],
    'include': ['BRC_ID', 'Gender_Description']
},
'highlight': {
    'fields': {
        'Gender_Description': {}
    }                
},
'query': {
    'bool': {
        'must': [
            {'nested': {
                'path': 'Diagnosis',
                'query': {
                    'bool': {
                        'must': [{'match_phrase': {'Diagnosis.Diagnosis': {'query': "dementia"}}}]
                    }  
                },
                'inner_hits': {
                    'highlight': {
                        'fields': {
                            'Diagnosis.Diagnosis': {}    
                        }    
                    },    
                    '_source': ['BRC_ID', 'Diagnosis']
                }
            }},
            {'match_phrase': {'Gender_Description': {'query': "female"}}}
        ]
    }
}}