Elasticsearch数据模型

时间:2015-06-09 09:40:07

标签: elasticsearch nested parent-child datamodel

我目前正在解析公司内部简历中的文字。目标是将elasticsearch中的所有内容编入索引以对其进行搜索。

目前我有以下没有定义映射的JSON文档:

每个同事都有一个项目列表,其中包含客户端名称

{
name: "Jean Wisser"
position: "Junior Developer"
"projects": [
        {
            "client": "SutrixMedia",
            "missions": [
                "Responsible for the quality on time and within budget",
                "Writing specs, testing,..."
            ],
            "technologies": "JIRA/Mantis/Adobe CQ5 (AEM)"
        },
        {
            "client": "Société Générale",
            "missions": [
                " Writing test cases and scenarios",
                " UAT"
             ],
            "technologies": "HP QTP/QC"
        }
    ]
}

我们想回答的两个主要问题是:

  1. 哪位同事已经在这家公司工作过?
  2. 哪个客户使用此技术?
  3. 第一个问题很容易回答,例如: Projects.client="SutrixMedia“给我正确的简历。

    但是我怎么能回答第二个呢?

    我想做一个这样的查询:Projects.technologies="HP QTP/QC",答案只是客户名称(本例中为“SociétéGénérale”),而不是整个文档。

    是否可以通过定义嵌套类型的映射来获得此答案? 或者我应该进行父/子映射?

1 个答案:

答案 0 :(得分:1)

是的,确实,如果您将projects映射为nested类型,然后检索嵌套的inner_hits,则可以使用ES 1.5。*

所以这里是上面的示例文档的映射:

curl -XPUT localhost:9200/resumes -d '
{
  "mappings": {
    "resume": {
      "properties": {
        "name": {
          "type": "string"
        },
        "position": {
          "type": "string"
        },
        "projects": {
          "type": "nested",        <--- declare "projects" as nested type
          "properties": {
            "client": {
              "type": "string",
              "fields": {
                "raw": {
                  "type": "string",
                  "index": "not_analyzed"
                }
              }
            },
            "missions": {
              "type": "string"
            },
            "technologies": {
              "type": "string",
              "fields": {
                "raw": {
                  "type": "string",
                  "index": "not_analyzed"
                }
              }
            }
          }
        }
      }
    }
  }
}'

然后,您可以从上面索引样本文档:

curl -XPUT localhost:9200/resumes/resume/1 -d '{...}'

最后,使用以下仅检索nested inner_hits的查询,您只能检索与Projects.technologies="HP QTP/QC"匹配的嵌套对象

curl -XPOST localhost:9200/resumes/resume/_search -d '
{
  "_source": false,
  "query": {
    "nested": {
      "path": "projects",
      "query": {
        "term": {
          "projects.technologies.raw": "HP QTP/QC"
        }
      },
      "inner_hits": {           <----- only retrieve the matching nested document
        "_source": "client"     <----- and only the "client" field 
      }
    }
  }
}'

仅产生客户名称而不是整个匹配文档:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.4054651,
    "hits" : [ {
      "_index" : "resumes",
      "_type" : "resume",
      "_id" : "1",
      "_score" : 1.4054651,
      "inner_hits" : {
        "projects" : {
          "hits" : {
            "total" : 1,
            "max_score" : 1.4054651,
            "hits" : [ {
              "_index" : "resumes",
              "_type" : "resume",
              "_id" : "1",
              "_nested" : {
                "field" : "projects",
                "offset" : 1
              },
              "_score" : 1.4054651,
              "_source":{"client":"Société Générale"}  <--- here is the client name
            } ]
          }
        }
      }
    } ]
  }
}