在使用AWS Elasticsearch(2.3)时,我加载了一些示例数据 https://www.elastic.co/guide/en/kibana/3.0/snippets/shakespeare.json具有以下映射
$ curl --url "https://my_es_id.us-east-1.es.amazonaws.com/shakespeare/_mapping"
{
"shakespeare": {
"mappings": {
"act": {
"properties": {
"line_id": {
"type": "integer"
},
"line_number": {
"type": "string"
},
"play_name": {
"fields": {
"raw": {
"index": "not_analyzed",
"type": "string"
}
},
"type": "string"
},
"speaker": {
"fields": {
"raw": {
"index": "not_analyzed",
"type": "string"
}
},
"type": "string"
},
"speech_number": {
"type": "integer"
},
"text_entry": {
"type": "string"
}
}
},
"line": {
"properties": {
"line_id": {
"type": "integer"
},
"line_number": {
"type": "string"
},
"play_name": {
"type": "string"
},
"speaker": {
"type": "string"
},
"speech_number": {
"type": "integer"
},
"text_entry": {
"type": "string"
}
}
},
"scene": {
"properties": {
"line_id": {
"type": "integer"
},
"line_number": {
"type": "string"
},
"play_name": {
"type": "string"
},
"speaker": {
"type": "string"
},
"speech_number": {
"type": "integer"
},
"text_entry": {
"type": "string"
}
}
}
}
}
}
现在,当我运行查询以获取整个数据的扬声器计数时,我得到以下结果。
$ curl -XPOST "https://my_es_id.us-east-1.es.amazonaws.com/shakespeare/_search" -d'
{
"aggs" : {
"speakers" : {
"terms" : { "field" : "speaker.raw"}
}
}
}'
{
"_shards": {
"failed": 0,
"successful": 5,
"total": 5
},
"aggregations": {
"speakers": {
"buckets": [
{
"doc_count": 4,
"key": "BASTARD"
},
{
"doc_count": 3,
"key": "HAMLET"
},
{
"doc_count": 3,
"key": "KING HENRY VIII"
},
{
"doc_count": 3,
"key": "OF SYRACUSE"
},
{
"doc_count": 3,
"key": "PROSPERO"
},
{
"doc_count": 3,
"key": "WARWICK"
},
{
"doc_count": 2,
"key": "ADRIANO DE ARMADO"
},
{
"doc_count": 2,
"key": "ARCHBISHOP OF YORK"
},
{
"doc_count": 2,
"key": "AUFIDIUS"
},
{
"doc_count": 2,
"key": "BENEDICK"
}
],
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 153
}
},
"hits": {
"hits": [
{
"_id": "0",
"_index": "shakespeare",
"_score": 1.0,
"_source": {
"line_id": 1,
"line_number": "",
"play_name": "Henry IV",
"speaker": "",
"speech_number": "",
"text_entry": "ACT I"
},
"_type": "act"
},
{
"_id": "14",
"_index": "shakespeare",
"_score": 1.0,
"_source": {
"line_id": 15,
"line_number": "1.1.12",
"play_name": "Henry IV",
"speaker": "KING HENRY IV",
"speech_number": 1,
"text_entry": "Did lately meet in the intestine shock"
},
"_type": "line"
},
{
"_id": "19",
"_index": "shakespeare",
"_score": 1.0,
"_source": {
"line_id": 20,
"line_number": "1.1.17",
"play_name": "Henry IV",
"speaker": "KING HENRY IV",
"speech_number": 1,
"text_entry": "The edge of war, like an ill-sheathed knife,"
},
"_type": "line"
},
{
"_id": "22",
"_index": "shakespeare",
"_score": 1.0,
"_source": {
"line_id": 23,
"line_number": "1.1.20",
"play_name": "Henry IV",
"speaker": "KING HENRY IV",
"speech_number": 1,
"text_entry": "Whose soldier now, under whose blessed cross"
},
"_type": "line"
},
{
"_id": "24",
"_index": "shakespeare",
"_score": 1.0,
"_source": {
"line_id": 25,
"line_number": "1.1.22",
"play_name": "Henry IV",
"speaker": "KING HENRY IV",
"speech_number": 1,
"text_entry": "Forthwith a power of English shall we levy;"
},
"_type": "line"
},
{
"_id": "25",
"_index": "shakespeare",
"_score": 1.0,
"_source": {
"line_id": 26,
"line_number": "1.1.23",
"play_name": "Henry IV",
"speaker": "KING HENRY IV",
"speech_number": 1,
"text_entry": "Whose arms were moulded in their mothers womb"
},
"_type": "line"
},
{
"_id": "26",
"_index": "shakespeare",
"_score": 1.0,
"_source": {
"line_id": 27,
"line_number": "1.1.24",
"play_name": "Henry IV",
"speaker": "KING HENRY IV",
"speech_number": 1,
"text_entry": "To chase these pagans in those holy fields"
},
"_type": "line"
},
{
"_id": "29",
"_index": "shakespeare",
"_score": 1.0,
"_source": {
"line_id": 30,
"line_number": "1.1.27",
"play_name": "Henry IV",
"speaker": "KING HENRY IV",
"speech_number": 1,
"text_entry": "For our advantage on the bitter cross."
},
"_type": "line"
},
{
"_id": "40",
"_index": "shakespeare",
"_score": 1.0,
"_source": {
"line_id": 41,
"line_number": "1.1.38",
"play_name": "Henry IV",
"speaker": "WESTMORELAND",
"speech_number": 2,
"text_entry": "Whose worst was, that the noble Mortimer,"
},
"_type": "line"
},
{
"_id": "41",
"_index": "shakespeare",
"_score": 1.0,
"_source": {
"line_id": 42,
"line_number": "1.1.39",
"play_name": "Henry IV",
"speaker": "WESTMORELAND",
"speech_number": 2,
"text_entry": "Leading the men of Herefordshire to fight"
},
"_type": "line"
}
],
"max_score": 1.0,
"total": 111396
},
"timed_out": false,
"took": 28
}
聚合桶中的文件数量似乎非常低。我期望看到以下发言人的文件计数(以下我通过明确评估整个数据的发言人数来计算):
GLOUCESTER 1920
HAMLET 1582
IAGO 1161
FALSTAFF 1117
KING HENRY V 1086
BRUTUS 1051
OTHELLO 928
MARK ANTONY 927
KING HENRY VI 917
DUKE VINCENTIO 909
我花了几个小时在网上搜索这个问题的原因,但我无法理解。我做错了什么?
答案 0 :(得分:0)
根本原因是映射中的错误以及搜索数据的方式。仅为doc_type设置映射:' act'当它应该设置为doc_type:' line'时,搜索也不应该只是doc_type:' line'。
详细答案:
按照此页面中的示例:https://www.elastic.co/guide/en/elasticsearch/guide/current/aggregations-and-analysis.html我意识到错误在映射中。
在:
后:
弹性搜索的结果现在与从此数据集手动获取的计数I匹配。 doc_type:行中前10位发言者的当前计数如下:
GLOUCESTER 1907 HAMLET 1572 IAGO 1153 FALSTAFF 1109 KING HENRY V 1076 BRUTUS 1043 OTHELLO 928 马克安东尼915 KING HENRY VI 909 DUKE VINCENTIO 901
这是正确的映射:
{
"shakespeare" : {
"mappings" : {
"line" : {
"properties" : {
"line_id" : {
"type" : "integer"
},
"line_number" : {
"type" : "string"
},
"play_name" : {
"type" : "string",
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"speaker" : {
"type" : "string",
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"speech_number" : {
"type" : "integer"
},
"text_entry" : {
"type" : "string"
}
}
},
"act" : {
"properties" : {
"line_id" : {
"type" : "integer"
},
"line_number" : {
"type" : "string"
},
"play_name" : {
"type" : "string",
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"speaker" : {
"type" : "string",
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"speech_number" : {
"type" : "integer"
},
"text_entry" : {
"type" : "string"
}
}
},
"scene" : {
"properties" : {
"line_id" : {
"type" : "integer"
},
"line_number" : {
"type" : "string"
},
"play_name" : {
"type" : "string",
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"speaker" : {
"type" : "string",
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"speech_number" : {
"type" : "integer"
},
"text_entry" : {
"type" : "string"
}
}
}
}
}
}
使用新的映射,结果看起来正确:
curl -XPOST "https://my_es_id/shakespeare/line/_search" -d'
{
"aggs" : {
"speakers" : {
"terms" : { "field" : "speaker.raw"}
}
}
}'
{
"_shards": {
"failed": 0,
"successful": 5,
"total": 5
},
"aggregations": {
"speakers": {
"buckets": [
{
"doc_count": 1907,
"key": "GLOUCESTER"
},
{
"doc_count": 1572,
"key": "HAMLET"
},
{
"doc_count": 1153,
"key": "IAGO"
},
{
"doc_count": 1109,
"key": "FALSTAFF"
},
{
"doc_count": 1076,
"key": "KING HENRY V"
},
{
"doc_count": 1043,
"key": "BRUTUS"
},
{
"doc_count": 928,
"key": "OTHELLO"
},
{
"doc_count": 915,
"key": "MARK ANTONY"
},
{
"doc_count": 909,
"key": "KING HENRY VI"
},
{
"doc_count": 901,
"key": "DUKE VINCENTIO"
}
],
"doc_count_error_upper_bound": 461,
"sum_other_doc_count": 94715
}
},
"hits": {
"hits": [
{
"_id": "14",
"_index": "shakespeare",
"_score": 1.0,
"_source": {
"line_id": 15,
"line_number": "1.1.12",
"play_name": "Henry IV",
"speaker": "KING HENRY IV",
"speech_number": 1,
"text_entry": "Did lately meet in the intestine shock"
},
"_type": "line"
},
{
"_id": "19",
"_index": "shakespeare",
"_score": 1.0,
"_source": {
"line_id": 20,
"line_number": "1.1.17",
"play_name": "Henry IV",
"speaker": "KING HENRY IV",
"speech_number": 1,
"text_entry": "The edge of war, like an ill-sheathed knife,"
},
"_type": "line"
},
{
"_id": "22",
"_index": "shakespeare",
"_score": 1.0,
"_source": {
"line_id": 23,
"line_number": "1.1.20",
"play_name": "Henry IV",
"speaker": "KING HENRY IV",
"speech_number": 1,
"text_entry": "Whose soldier now, under whose blessed cross"
},
"_type": "line"
},
{
"_id": "24",
"_index": "shakespeare",
"_score": 1.0,
"_source": {
"line_id": 25,
"line_number": "1.1.22",
"play_name": "Henry IV",
"speaker": "KING HENRY IV",
"speech_number": 1,
"text_entry": "Forthwith a power of English shall we levy;"
},
"_type": "line"
},
{
"_id": "25",
"_index": "shakespeare",
"_score": 1.0,
"_source": {
"line_id": 26,
"line_number": "1.1.23",
"play_name": "Henry IV",
"speaker": "KING HENRY IV",
"speech_number": 1,
"text_entry": "Whose arms were moulded in their mothers womb"
},
"_type": "line"
},
{
"_id": "26",
"_index": "shakespeare",
"_score": 1.0,
"_source": {
"line_id": 27,
"line_number": "1.1.24",
"play_name": "Henry IV",
"speaker": "KING HENRY IV",
"speech_number": 1,
"text_entry": "To chase these pagans in those holy fields"
},
"_type": "line"
},
{
"_id": "29",
"_index": "shakespeare",
"_score": 1.0,
"_source": {
"line_id": 30,
"line_number": "1.1.27",
"play_name": "Henry IV",
"speaker": "KING HENRY IV",
"speech_number": 1,
"text_entry": "For our advantage on the bitter cross."
},
"_type": "line"
},
{
"_id": "40",
"_index": "shakespeare",
"_score": 1.0,
"_source": {
"line_id": 41,
"line_number": "1.1.38",
"play_name": "Henry IV",
"speaker": "WESTMORELAND",
"speech_number": 2,
"text_entry": "Whose worst was, that the noble Mortimer,"
},
"_type": "line"
},
{
"_id": "41",
"_index": "shakespeare",
"_score": 1.0,
"_source": {
"line_id": 42,
"line_number": "1.1.39",
"play_name": "Henry IV",
"speaker": "WESTMORELAND",
"speech_number": 2,
"text_entry": "Leading the men of Herefordshire to fight"
},
"_type": "line"
},
{
"_id": "44",
"_index": "shakespeare",
"_score": 1.0,
"_source": {
"line_id": 45,
"line_number": "1.1.42",
"play_name": "Henry IV",
"speaker": "WESTMORELAND",
"speech_number": 2,
"text_entry": "A thousand of his people butchered;"
},
"_type": "line"
}
],
"max_score": 1.0,
"total": 106228
},
"timed_out": false,
"took": 48
}