我的Neo4J数据库目前有超过1300万个节点,边缘数量多。简化结构如下(省略大多数边缘类型)
User - HAS_EMAIL -> Email // is unique
- HAS_IBAN -> Iban // is unique
- HAS_PHONE -> Phone // is unique
我希望得到所有彼此互连的用户ID,无论路径的长度如何。这样我就可以
我使用Neo4Js HTTP API开始使用如下的密码查询。
MATCH (u:User {uid: '12345'})-[*1..]-(otherUser)
RETURN DISTINCT otherUser
使用不带上限的可变长度模式匹配,特别是没有限制,这很慢。
所以我挖了一下,用expandConfig
方法找到了APOC库。
MATCH (u:User {uid: '12345'})
CALL apoc.path.expandConfig(c, {bfs:true, uniqueness:"NODE_GLOBAL"}) YIELD path
// Extracting the 'uid' property
RETURN extract(
n IN (
// We only want 'User' nodes
filter (
x IN NODES(path) WHERE 'User' IN labels(x)
)
) | n.uid
) as uid
这就像魅力一样,在大多数情况下会在几毫秒内返回所有节点。
查询一个我知道他是#34;非常好的用户需要将近30秒。连接(24k节点,40k边缘)。
回复示例
{
"results": [
{
"columns": [
"uid"
],
"data": [
{"row": [["9974"]], "meta": [null]},
{"row": [["9974"]], "meta": [null]},
{"row": [["9974"]], "meta": [null]},
{"row": [["9974","14367"] ],"meta": [null,null]},
{"row": [["9974","11820"] ],"meta": [null,null]},
{"row": [["9974","11821"] ],"meta": [null,null]},
{"row": [["9974","11822"] ],"meta": [null,null]},
{"row": [["9974","11823"] ],"meta": [null,null]},
{"row": [["9974","9314"] ],"meta": [null,null]},
{"row": [["9974","9313"] ],"meta": [null,null]},
{"row": [["9974","9317"] ],"meta": [null,null]},
{"row": [["9974","14367"] ],"meta": [null,null]},
{"row": [["9974","11820"] ],"meta": [null,null]},
{"row": [["9974","11821"] ],"meta": [null,null]},
{"row": [["9974","11822"] ],"meta": [null,null]},
{"row": [["9974","11823"] ],"meta": [null,null]},
{"row": [["9974","9314"] ],"meta": [null,null]},
{"row": [["9974","9313"] ],"meta": [null,null]},
{"row": [["9974","9317"] ],"meta": [null,null]},
{"row": [["9974","11820","3287" ]],"meta": [null,null,null]},
{"row": [["9974","11820","39584" ]],"meta": [null,null,null]},
{"row": [["9974","11820","5109" ]],"meta": [null,null,null]},
{"row": [["9974","11820","3379" ]],"meta": [null,null,null]},
{"row": [["9974","11820","3288" ]],"meta": [null,null,null]},
--- Snipp ---
现在我想摆脱所有重复项,得到如下结果
{
"results": [
{
"columns": [
"uid"
],
"data": [
{"row": [["9974"]], "meta": [null]},
{"row": [["14367"]], "meta": [null]},
{"row": [["11820"]], "meta": [null]},
{"row": [["11821"]],"meta": [null]},
{"row": [["11822"]],"meta": [null]},
{"row": [["11823"]],"meta": [null]},
{"row": [["9314"]],"meta": [null]},
{"row": [["9313"]],"meta": [null]},
{"row": [["9317"]],"meta": [null]},
{"row": [["14367"]],"meta": [null]},
{"row": [["11820"]],"meta": [null]},
{"row": [["11821"]],"meta": [null]},
{"row": [["11822"]],"meta": [null]},
{"row": [["11823"]],"meta": [null]},
--- snipp ---
我将如何做到这一点? 很高兴:有没有办法让这个快?
答案 0 :(得分:1)
有一些调整可以使这更快。
首先,您将遍历所有路径的节点(路径)。那里会有很多重复的节点,因为公共路径将重用相同的节点。
由于您正在使用NODE_GLOBAL唯一性,所有路径的末端节点应该构成整个子图,因此我们可以将这些作为行,然后执行过滤:用户节点(具有特定的语法)用于检查节点是否具有特定标签),然后获取uids。
MATCH (u:User {uid: '12345'})
CALL apoc.path.expandConfig(c, {bfs:true, uniqueness:"NODE_GLOBAL"}) YIELD path
WITH DISTINCT LAST(NODES(path)) as user
WHERE user:User
RETURN COLLECT(user.uid) as uid
如果你不想在一个集合中使用uid,那么最后只需返回user.uid
。