对ElasticSearch中的文档数量进行排序

时间:2012-08-11 12:43:03

标签: elasticsearch

我正在ES索引中保存用户关系

{'id'=> 1,'User_id_1'=> '2001','关系'=> 'friend','User_id_2'=> '1002'} {'id'=> 2,'User_id_1'=> '2002','relation'=> 'friend','User_id_2'=> '1002'}

{'id'=> 3,'User_id_1'=> '2002','relation'=> 'friend','User_id_2'=> '1001'} {'id'=> 4,'User_id_1'=> '2003','relation'=> 'friend','User_id_2'=> '1003'}

不,我想要获得拥有最多朋友的user_id_2,

在上述案例中,其1002为2001年,2002年是其朋友。 (计数= 2)

我无法弄清楚查询

感谢。

编辑:

正如@imotov建议的那样,术语facets是非常好的选择,但是

我遇到的问题是2索引

第一个索引用于保存主文档,第二个索引用于保存关系

现在问题是

假设我的主索引中有100个USER Docs,其中只有50个已建立关系,所以我的关系索引中只有50个USER Docs

因此,当我实现“术语方面”时,它会对结果进行排序并提供我想要的正确输出,但我遗漏了那些还没有任何关系的50位用户,我需要在我的最终输出之后50个排序用户。

1 个答案:

答案 0 :(得分:1)

首先,我们需要确保ES中保存的关系是唯一的。可以通过使用user_id_1,relation和user_id_2构造的id替换任意id来完成。我们还需要确保user_ids的分析器不会生成多个令牌。如果id是字符串,则必须将它们编入索引not_analyzed。在满足这两个条件的情况下,我们可以简单地对关系限制的结果列表中的字段user_id_2使用terms facet查询:friend。此查询将检索按索引中出现次数排序的最高user_id_2 ID。总之,它看起来像这样:

curl -XPUT http://localhost:9200/relationships -d '{
    "mappings" : {
        "relation" : {
            "_source" : {"enabled" : false },
            "properties" : {
                "user_id_1": { "type": "string", "index" : "not_analyzed"},
                "relation": { "type": "string", "index" : "not_analyzed"},
                "user_id_2": { "type": "string", "index" : "not_analyzed"}
            }
        }
    }
}'

curl -XPUT http://localhost:9200/relationships/relation/2001-friend-1002 -d '{"user_id_1": "2001", "relation":"friend", "user_id_2": "1002"}'
curl -XPUT http://localhost:9200/relationships/relation/2002-friend-1002 -d '{"user_id_1": "2002", "relation":"friend", "user_id_2": "1002"}'
curl -XPUT http://localhost:9200/relationships/relation/2002-friend-1001 -d '{"user_id_1": "2002", "relation":"friend", "user_id_2": "1001"}'
curl -XPUT http://localhost:9200/relationships/relation/2003-friend-1003 -d '{"user_id_1": "2003", "relation":"friend", "user_id_2": "1003"}'
curl -XPOST http://localhost:9200/relationships/_refresh
echo


curl -XGET 'http://localhost:9200/relationships/relation/_search?pretty=true&search_type=count' -d '{
  "query": {
    "term" : {
      "relation" : "friend"
    }
  },
  "facets" : {
      "popular" : {
          "terms" : {
              "field" : "user_id_2"
          }
      }
  }
}'

请注意,由于facet计算的分布式特性,如果使用多个分片,facet查询报告的计数可能低于实际记录数。见elasticsearch issue 1832

编辑:

编辑问题有两种解决方案。一种解决方案是在两个领域使用facet:

curl -XPUT http://localhost:9200/relationships -d '{
    "mappings" : {
        "relation" : {
            "_source" : {"enabled" : false },
            "properties" : {
                "user_id_1": { "type": "string", "index" : "not_analyzed"},
                "relation": { "type": "string", "index" : "not_analyzed"},
                "user_id_2": { "type": "string", "index" : "not_analyzed"}
            }
        }
    }
}'
curl -XPUT http://localhost:9200/users -d '{
    "mappings" : {
        "user" : {
            "_source" : {"enabled" : false },
            "properties" : {
                "user_id": { "type": "string", "index" : "not_analyzed"}
            }
        }
    }
}'

curl -XPUT http://localhost:9200/users/user/1001 -d '{"user_id": 1001}'
curl -XPUT http://localhost:9200/users/user/1002 -d '{"user_id": 1002}'
curl -XPUT http://localhost:9200/users/user/1003 -d '{"user_id": 1003}'
curl -XPUT http://localhost:9200/users/user/1004 -d '{"user_id": 1004}'
curl -XPUT http://localhost:9200/users/user/1005 -d '{"user_id": 1005}'
curl -XPUT http://localhost:9200/relationships/relation/2001-friend-1002 -d '{"user_id_1": "2001", "relation":"friend", "user_id_2": "1002"}'
curl -XPUT http://localhost:9200/relationships/relation/2002-friend-1002 -d '{"user_id_1": "2002", "relation":"friend", "user_id_2": "1002"}'
curl -XPUT http://localhost:9200/relationships/relation/2002-friend-1001 -d '{"user_id_1": "2002", "relation":"friend", "user_id_2": "1001"}'
curl -XPUT http://localhost:9200/relationships/relation/2003-friend-1003 -d '{"user_id_1": "2003", "relation":"friend", "user_id_2": "1003"}'
curl -XPOST http://localhost:9200/relationships/_refresh
curl -XPOST http://localhost:9200/users/_refresh
echo


curl -XGET 'http://localhost:9200/relationships,users/_search?pretty=true&search_type=count' -d '{
    "query": {
        "indices" : {
          "indices" : ["relationships"],
          "query" : {
              "filtered" : {
                  "query" : {
                      "term" : {
                          "relation" : "friend"
                      }
                  },
                  "filter" : {
                      "type" : {
                          "value" : "relation"
                      }
                  }
              }
          },
          "no_match_query" : {
              "filtered" : {
                  "query" : {
                      "match_all" : { }
                  },
                  "filter" : {
                      "type" : {
                          "value" : "user"
                      }
                  }
              }

          }      
        }
    },
    "facets" : {
        "popular" : {
          "terms" : {
              "fields" : ["user_id", "user_id_2"]
          }
        }
    }
}'

另一种解决方案是在创建用户时为每个用户的关系索引添加“自我”关系。我更喜欢第二种解决方案,因为它似乎不太复杂。