我目前正在解析公司内部简历中的文字。目标是将elasticsearch中的所有内容编入索引以对其进行搜索。
目前我有以下没有定义映射的JSON文档:
每个同事都有一个项目列表,其中包含客户端名称
{
name: "Jean Wisser"
position: "Junior Developer"
"projects": [
{
"client": "SutrixMedia",
"missions": [
"Responsible for the quality on time and within budget",
"Writing specs, testing,..."
],
"technologies": "JIRA/Mantis/Adobe CQ5 (AEM)"
},
{
"client": "Société Générale",
"missions": [
" Writing test cases and scenarios",
" UAT"
],
"technologies": "HP QTP/QC"
}
]
}
我们想回答的两个主要问题是:
第一个问题很容易回答,例如:
Projects.client="SutrixMedia
“给我正确的简历。
但是我怎么能回答第二个呢?
我想做一个这样的查询:Projects.technologies="HP QTP/QC"
,答案只是客户名称(本例中为“SociétéGénérale”),而不是整个文档。
是否可以通过定义嵌套类型的映射来获得此答案? 或者我应该进行父/子映射?
答案 0 :(得分:1)
是的,确实,如果您将projects
映射为nested
类型,然后检索嵌套的inner_hits
,则可以使用ES 1.5。*
所以这里是上面的示例文档的映射:
curl -XPUT localhost:9200/resumes -d '
{
"mappings": {
"resume": {
"properties": {
"name": {
"type": "string"
},
"position": {
"type": "string"
},
"projects": {
"type": "nested", <--- declare "projects" as nested type
"properties": {
"client": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"missions": {
"type": "string"
},
"technologies": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
}
}'
然后,您可以从上面索引样本文档:
curl -XPUT localhost:9200/resumes/resume/1 -d '{...}'
最后,使用以下仅检索nested inner_hits
的查询,您只能检索与Projects.technologies="HP QTP/QC"
匹配的嵌套对象
curl -XPOST localhost:9200/resumes/resume/_search -d '
{
"_source": false,
"query": {
"nested": {
"path": "projects",
"query": {
"term": {
"projects.technologies.raw": "HP QTP/QC"
}
},
"inner_hits": { <----- only retrieve the matching nested document
"_source": "client" <----- and only the "client" field
}
}
}
}'
仅产生客户名称而不是整个匹配文档:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.4054651,
"hits" : [ {
"_index" : "resumes",
"_type" : "resume",
"_id" : "1",
"_score" : 1.4054651,
"inner_hits" : {
"projects" : {
"hits" : {
"total" : 1,
"max_score" : 1.4054651,
"hits" : [ {
"_index" : "resumes",
"_type" : "resume",
"_id" : "1",
"_nested" : {
"field" : "projects",
"offset" : 1
},
"_score" : 1.4054651,
"_source":{"client":"Société Générale"} <--- here is the client name
} ]
}
}
}
} ]
}
}