我正在尝试使用Neo4j运行此查询但运行时间太长(超过30分钟,近2500个节点和180万个关系):
Match (a:Art)-[r1]->(b:Art)
with collect({start:a.url,end:b.url,score:r1.ed_sc}) as row1
MATCH (a:Art)-[r1]->(b:Art)-[r2]->(c:Art)
Where a.url<>c.url
with row1 + collect({start:a.url,end:c.url,score:r1.ed_sc*r2.ed_sc}) as row2
Match (a:Art)-[r1]->(b:Art)-[r2]->(c:Art)-[r3]->(d:Art)
WHERE a.url<>c.url and b.url<>d.url and a.url<>d.url
with row2+collect({start:a.url,end:d.url,score:r1.ed_sc*r2.ed_sc*r3.ed_sc}) as allRows
unwind allRows as row
RETURN row.start as start ,row.end as end , sum(row.score) as final_score limit 10;
此处:Art
是有2500个节点的标签,这些节点之间存在双向关系,具有名为ed_sc
的属性。所以基本上我试图通过遍历一个,两个和三个路径来找到两个节点之间的分数,然后对这些分数求和。
有更优化的方法吗?
答案 0 :(得分:0)
对于一个人,我不鼓励使用双向关系。如果您的图形密集连接,这种建模将对大多数查询造成严重破坏。
假设url
对于每个:Art节点都是唯一的,那么比较节点本身而不是它们的属性会更好。
我们还应该能够使用可变长度关系代替您当前的方法:
MATCH p = (start:Art)-[*..3]->(end:Art)
WHERE all(node in nodes(p) WHERE single(t in nodes(p) where node = t))
WITH start, end, reduce(score = 1, rel in relationships(p) | score * rel.ed_sc) as score
WITH start, end, sum(score) as final_score
LIMIT 10
RETURN start.url as start, end.url as end, final_score