我正在ES索引中保存用户关系
即
{'id'=> 1,'User_id_1'=> '2001','关系'=> 'friend','User_id_2'=> '1002'} {'id'=> 2,'User_id_1'=> '2002','relation'=> 'friend','User_id_2'=> '1002'}
{'id'=> 3,'User_id_1'=> '2002','relation'=> 'friend','User_id_2'=> '1001'} {'id'=> 4,'User_id_1'=> '2003','relation'=> 'friend','User_id_2'=> '1003'}
不,我想要获得拥有最多朋友的user_id_2,
在上述案例中,其1002为2001年,2002年是其朋友。 (计数= 2)
我无法弄清楚查询
感谢。
编辑:
正如@imotov建议的那样,术语facets是非常好的选择,但是
我遇到的问题是2索引
第一个索引用于保存主文档,第二个索引用于保存关系
现在问题是
假设我的主索引中有100个USER Docs,其中只有50个已建立关系,所以我的关系索引中只有50个USER Docs
因此,当我实现“术语方面”时,它会对结果进行排序并提供我想要的正确输出,但我遗漏了那些还没有任何关系的50位用户,我需要在我的最终输出之后50个排序用户。
答案 0 :(得分:1)
首先,我们需要确保ES中保存的关系是唯一的。可以通过使用user_id_1,relation和user_id_2构造的id替换任意id来完成。我们还需要确保user_ids的分析器不会生成多个令牌。如果id是字符串,则必须将它们编入索引not_analyzed。在满足这两个条件的情况下,我们可以简单地对关系限制的结果列表中的字段user_id_2使用terms facet查询:friend。此查询将检索按索引中出现次数排序的最高user_id_2 ID。总之,它看起来像这样:
curl -XPUT http://localhost:9200/relationships -d '{
"mappings" : {
"relation" : {
"_source" : {"enabled" : false },
"properties" : {
"user_id_1": { "type": "string", "index" : "not_analyzed"},
"relation": { "type": "string", "index" : "not_analyzed"},
"user_id_2": { "type": "string", "index" : "not_analyzed"}
}
}
}
}'
curl -XPUT http://localhost:9200/relationships/relation/2001-friend-1002 -d '{"user_id_1": "2001", "relation":"friend", "user_id_2": "1002"}'
curl -XPUT http://localhost:9200/relationships/relation/2002-friend-1002 -d '{"user_id_1": "2002", "relation":"friend", "user_id_2": "1002"}'
curl -XPUT http://localhost:9200/relationships/relation/2002-friend-1001 -d '{"user_id_1": "2002", "relation":"friend", "user_id_2": "1001"}'
curl -XPUT http://localhost:9200/relationships/relation/2003-friend-1003 -d '{"user_id_1": "2003", "relation":"friend", "user_id_2": "1003"}'
curl -XPOST http://localhost:9200/relationships/_refresh
echo
curl -XGET 'http://localhost:9200/relationships/relation/_search?pretty=true&search_type=count' -d '{
"query": {
"term" : {
"relation" : "friend"
}
},
"facets" : {
"popular" : {
"terms" : {
"field" : "user_id_2"
}
}
}
}'
请注意,由于facet计算的分布式特性,如果使用多个分片,facet查询报告的计数可能低于实际记录数。见elasticsearch issue 1832
编辑:
编辑问题有两种解决方案。一种解决方案是在两个领域使用facet:
curl -XPUT http://localhost:9200/relationships -d '{
"mappings" : {
"relation" : {
"_source" : {"enabled" : false },
"properties" : {
"user_id_1": { "type": "string", "index" : "not_analyzed"},
"relation": { "type": "string", "index" : "not_analyzed"},
"user_id_2": { "type": "string", "index" : "not_analyzed"}
}
}
}
}'
curl -XPUT http://localhost:9200/users -d '{
"mappings" : {
"user" : {
"_source" : {"enabled" : false },
"properties" : {
"user_id": { "type": "string", "index" : "not_analyzed"}
}
}
}
}'
curl -XPUT http://localhost:9200/users/user/1001 -d '{"user_id": 1001}'
curl -XPUT http://localhost:9200/users/user/1002 -d '{"user_id": 1002}'
curl -XPUT http://localhost:9200/users/user/1003 -d '{"user_id": 1003}'
curl -XPUT http://localhost:9200/users/user/1004 -d '{"user_id": 1004}'
curl -XPUT http://localhost:9200/users/user/1005 -d '{"user_id": 1005}'
curl -XPUT http://localhost:9200/relationships/relation/2001-friend-1002 -d '{"user_id_1": "2001", "relation":"friend", "user_id_2": "1002"}'
curl -XPUT http://localhost:9200/relationships/relation/2002-friend-1002 -d '{"user_id_1": "2002", "relation":"friend", "user_id_2": "1002"}'
curl -XPUT http://localhost:9200/relationships/relation/2002-friend-1001 -d '{"user_id_1": "2002", "relation":"friend", "user_id_2": "1001"}'
curl -XPUT http://localhost:9200/relationships/relation/2003-friend-1003 -d '{"user_id_1": "2003", "relation":"friend", "user_id_2": "1003"}'
curl -XPOST http://localhost:9200/relationships/_refresh
curl -XPOST http://localhost:9200/users/_refresh
echo
curl -XGET 'http://localhost:9200/relationships,users/_search?pretty=true&search_type=count' -d '{
"query": {
"indices" : {
"indices" : ["relationships"],
"query" : {
"filtered" : {
"query" : {
"term" : {
"relation" : "friend"
}
},
"filter" : {
"type" : {
"value" : "relation"
}
}
}
},
"no_match_query" : {
"filtered" : {
"query" : {
"match_all" : { }
},
"filter" : {
"type" : {
"value" : "user"
}
}
}
}
}
},
"facets" : {
"popular" : {
"terms" : {
"fields" : ["user_id", "user_id_2"]
}
}
}
}'
另一种解决方案是在创建用户时为每个用户的关系索引添加“自我”关系。我更喜欢第二种解决方案,因为它似乎不太复杂。