这是我试图在neo4j中运行的查询,但运行时间太长:

时间:2018-06-05 09:05:16

标签: optimization neo4j cypher time-complexity graph-databases

我正在尝试使用Neo4j运行此查询但运行时间太长(超过30分钟,近2500个节点和180万个关系):

Match (a:Art)-[r1]->(b:Art)  
with collect({start:a.url,end:b.url,score:r1.ed_sc}) as row1

MATCH (a:Art)-[r1]->(b:Art)-[r2]->(c:Art)

Where a.url<>c.url
with row1 + collect({start:a.url,end:c.url,score:r1.ed_sc*r2.ed_sc}) as row2

Match (a:Art)-[r1]->(b:Art)-[r2]->(c:Art)-[r3]->(d:Art)

WHERE a.url<>c.url and b.url<>d.url and a.url<>d.url


with row2+collect({start:a.url,end:d.url,score:r1.ed_sc*r2.ed_sc*r3.ed_sc}) as allRows

unwind allRows as row

RETURN row.start as start ,row.end as end , sum(row.score) as final_score limit 10;

此处:Art是有2500个节点的标签,这些节点之间存在双向关系,具有名为ed_sc的属性。所以基本上我试图通过遍历一个,两个和三个路径来找到两个节点之间的分数,然后对这些分数求和。

有更优化的方法吗?

1 个答案:

答案 0 :(得分:0)

对于一个人,我不鼓励使用双向关系。如果您的图形密集连接,这种建模将对大多数查询造成严重破坏。

假设url对于每个:Art节点都是唯一的,那么比较节点本身而不是它们的属性会更好。

我们还应该能够使用可变长度关系代替您当前的方法:

MATCH p = (start:Art)-[*..3]->(end:Art)  
WHERE all(node in nodes(p) WHERE single(t in nodes(p) where node = t))
WITH start, end, reduce(score = 1, rel in relationships(p) | score * rel.ed_sc) as score
WITH start, end, sum(score) as final_score
LIMIT 10
RETURN start.url as start, end.url as end, final_score