Question

我想通过elasticsearch制作一种group_by。

我存储了一个文本字段：

"text": {
  "type": "string",
  "store": true,
  "analyzer" : "twittsTokenizor"
},

和geo pint字段

"geo": {
  "type": "geo_point",
  "store": true
}

我正在尝试在我的文字字段中获取最常用的字词，这些字段按我的位置分组，并且...它无效。

如果这样的查询有效（如果我在查询中定义我的位置）：

curl -XGET http://localhost:9200/twitter/_search  -d '{
"query" : {
    "match_all" : {}
    },
    "filter" : {
        "bool" : {
            "must" : [{
                "range" : {
                    "created_at" : {
                        "from" : "Mon Feb 22 14:04:23 +0000 2015",
                        "to" : "Wed Feb 23 22:06:25 +0000 2015"
                    }}
                },{
                "geo_distance" : {
                    "distance" : "100km",
                    "geo" : { "lat" : 48.856506, "lon" : 2.352133 }
                }}
            ]
        }
    },
    "facets" : {
        "tag" : {
            "terms" : {
                "field" : "text" ,
                "size" : 10
            }
        }

    }
}'

这不起作用：

curl -XGET http://localhost:9200/twitter/_search  -d '{
"query" : {
        "match_all" : {
        }
    },
    "aggs" : {
        "geo1" : {
            "terms" : {
                "field" : "geo"
            }
        },
        "tag" : {
            "terms" : {
                "field" : "text" ,
                "size" : 10
            }
        }
    }
}
}'

这不能胜任：

curl -XGET http://localhost:9200/twitter/_search  -d '{
"query" : {
        "match_all" : {
        }
    },
    "facets" : {
        "tag" : {
            "terms" : {
                "field" : "text" ,
                "size" : 10
            }
        },
        "geo1" : {
            "terms" : {
                "field" : "geo"
            }
        }
    }
}
}'

并且facet_filters没有完成这项工作。

我做错了什么？它甚至可能吗？非常感谢你。

编辑：这是我的映射：

curl -s -XPUT "http://localhost:9200/twitter" -d '
{
  "settings": {
    "analysis": {
      "analyzer": {
        "twittsTokenizor" : {
            "type" : "custom",
            "tokenizer" : "standard",
            "filter" : [
                "french_elision",
                "asciifolding",
                "lowercase",
                "french_stop",
                "english_stop",
                "twitter_stop"
            ]
        }
      },
      "filter" : {
        "french_elision" : {
          "type" : "elision",
          "articles" : [ "l", "m", "t", "qu", "n", "s",
                          "j", "d", "c", "jusqu", "quoiqu",
                          "lorsqu", "puisqu"
                        ]
        },
        "asciifolding" : {
            "type" : "asciifolding"
        },
        "french_stop": {
            "type":       "stop",
            "stopwords":  "_french_" 
        },
        "english_stop": {
            "type":       "stop",
            "stopwords":  "_english_" 
        },
        "twitter_stop" : {
            "type" : "stop",
            "stopwords": ["RT", "FAV", "TT", "FF", "rt"]
        }
      }
    }
  },
  "mappings": {
    "twitter": {
      "properties": {
        "id": {
          "type": "long",
          "store": true
        },
        "text": {
          "type": "string",
          "store": true,
          "analyzer" : "twittsTokenizor"
        },
        "created_at": {
          "type": "date",
          "format": "EE MMM d HH:mm:ss Z yyyy",
          "store": true
        },
        "location": {
          "type": "string",
          "store": true
        },
        "geo": {
          "type": "geo_point",
          "store": true
        }
      }
    }
  }
}'

和数据样本：

{ "_id" : ObjectId("54eb3c35a710901a698b4567"), "country" : "FR", "created_at" : "Mon Feb 23 14:25:30 +0000 2015", "geo" : { "lat" : 49.119696, "lon" : 6.176355 }, "id" : -812216320, "location" : "Metz ", "text" : "Passer des vacances formidable avec des gens géniaux sans aucunes pression avec pour seul soucis s'éclater et se laisser vivre #BONHEUR"}

Answer 1

如果我理解正确，您希望获得每个地理位置最常用的术语？然后，您需要两个级别的聚合，首先是geohash_grid聚合，然后是terms聚合：

{
    "query": {
        "match_all": {}
    },
    "aggs": {
        //Buckets for each geohash grid
        "geo1": {
            "geohash_grid": {
                "field":"geo",
                "precision": 5
            },
            "aggs": {
                //Buckets for each unique text-tag in this geo point bucket, maximum of 10 buckets
                "tags": {
                    "terms": {
                        "field": "text",
                        "size": 10
                     }
                }
            }
        }
    }
}

或者，您可以使用geo_distance聚合

Elasticsearch术语查询按geo_point分组

1 个答案: