限制Cypher查询

时间:2013-07-11 18:52:45

标签: optimization neo4j cypher

我目前正在使用具有50000个节点和200万个关系的neo4j数据库来执行cypher MATCH查询,如下所示:

start startnode = node(42660), endnode = node(30561)
match startnode-[r*1..3]->endnode
return r;

这个查询本身提供了443行,但我只希望Cypher找到5个匹配并仅返回它们。请允许我澄清一下:我不仅希望Cypher只返回5个结果,我还希望cypher在找到5个结果后停止查询。我不希望Cypher获得所有443个结果。

目前是否可以使用LIMIT子句?或者LIMIT是否等待找到所有443个结果,然后只返回前5个?

编辑LIMIT子句是否只查找复杂查询的前几个结果?

start graphnode = node(1), startnode = node(42660), endnode = node(30561)
match startnode<-[:CONTAINS]-graphnode-[:CONTAINS]->endnode
with startnode, endnode
match startnode-[r1*1..1]->endnode
with r1, startnode, endnode
limit 30
match startnode-[r2*2..2]->endnode
with r1, r2, startnode, endnode
limit 30
match startnode-[r3*3..3]->endnode
with r1, r2, r3, startnode, endnode
limit 30
return r1,r2,r3;

以下是查询的profile

==> ColumnFilter(symKeys=["  UNNAMED216", "endnode", "r1", "startnode", "r2", "r3"],   returnItemNames=["r1", "r2", "r3"], _rows=30, _db_hits=0)
==> Slice(limit="Literal(30)", _rows=30, _db_hits=0)
==>   PatternMatch(g="(startnode)-['  UNNAMED216']-(endnode)", _rows=30, _db_hits=0)
==>     ColumnFilter(symKeys=["endnode", "  UNNAMED140", "r1", "startnode", "r2"], returnItemNames=["r1", "r2", "startnode", "endnode"], _rows=1, _db_hits=0)
==>       Slice(limit="Literal(30)", _rows=1, _db_hits=0)
==>         PatternMatch(g="(startnode)-['  UNNAMED140']-(endnode)", _rows=1, _db_hits=0)
==>           ColumnFilter(symKeys=["startnode", "endnode", "  UNNAMED68", "r1"], returnItemNames=["r1", "startnode", "endnode"], _rows=1, _db_hits=0)
==>             Slice(limit="Literal(30)", _rows=1, _db_hits=0)
==>               PatternMatch(g="(startnode)-['  UNNAMED68']-(endnode)", _rows=1, _db_hits=0)
==>                 NodeById(name="Literal(List(30561))", identifier="endnode", _rows=1, _db_hits=1)
==>                   NodeById(name="Literal(List(42660))", identifier="startnode", _rows=1, _db_hits=1)

1 个答案:

答案 0 :(得分:2)

这取决于你正在做什么,但在这种情况下,如果你在limit 5之后添加return,它将能够懒惰地返回并跳过其余的匹配。如果您想要排序或聚合,则无法为您执行此操作。如果您发现这不是行为,请在github上将其报告为问题(以及您正在使用的版本等)

更新新查询

start graphnode = node(1), startnode = node(42660), endnode = node(30561)
match startnode<-[:CONTAINS]-graphnode-[:CONTAINS]->endnode // do you need this, or is it always going to be true?
with startnode, endnode                                     // ditto. take it out if it doesn't need to be here.
match startnode-[r1*1..1]->endnode // this can probably be simplified to just startnode-[r1]->endnode
with r1, startnode, endnode 
limit 30 // limit to the first 30 it finds in the previous match (this should be lazy)
match startnode-[r2*2..2]->endnode // finds 2 levels deep
with r1, r2, startnode, endnode
limit 30 // limit to the first 30 it finds in the previous match (this should be lazy)
match startnode-[r3*3..3]->endnode
return r1,r2,r3 // the last with you had was extraneous, return will function the same way
limit 30; 

所以,我假设你问一个问题,因为这个查询很慢。我可能会问你为什么要用这种方式分解它,而不仅仅是startnode-[r*1..3]->endnodelimit 30?你真的需要第一场比赛/,还是不需要检查?你能提供PROFILE的输出吗?