如何使这个Neo4J Cypher查询执行得更快?

时间:2014-06-14 13:09:03

标签: neo4j cypher

我在Neo4J中有以下Cypher查询,它获取图中的所有节点及其与JSON文件的连接,然后使用Sigma.Js库显示图形。

MATCH (c1:Concept), (c2:Concept), (ctx:Context), c1-[rel:TO]->c2 
WHERE (rel.user='9d6e7140-f3c3-11e3-927f-1f5ca4210ac7' 
AND ctx.uid = rel.context) 
WITH DISTINCT c1, c2 
MATCH (ctxname:Context), c1-[relall:TO]->c2 
WHERE (relall.user='9d6e7140-f3c3-11e3-927f-1f5ca4210ac7' 
AND ctxname.uid = relall.context) 
RETURN DISTINCT 
c1.uid AS source_id, 
c1.name AS source_name, 
c2.uid AS target_id, 
c2.name AS target_name, 
relall.uid AS edge_id, 
ctxname.name AS context_name, 
relall.statement AS statement_id, 
relall.weight AS weight;

此特定查询返回89行数据。

奇怪的是,当c1c2节点和rel关系的数量很小时,它的工作速度相对较快。但是,随着这些节点的数量和它们之间的关系的增加,查询变得非常慢,可能是因为Neo4J必须重复很多关系。

如果我需要以相同的格式返回数据并且它应该在一个查询中完成,您是否知道如何更快地进行此查询?

这是个人资料信息:

Distinct(_rows=89, _db_hits=0)
Extract(symKeys=["c1", "c2", "ctxname", "relall"], exprKeys=["source_name", 
"statement_id", "edge_id", "target_id", "source_id", "target_name", "context_name", 
"weight"], _rows=89, _db_hits=712)

Filter(pred="(Property(relall,user(8)) == Literal(9d6e7140-f3c3-11e3-927f-1f5ca4210ac7) 
AND Property(ctxname,uid(1)) == Property(relall,context(7)))", _rows=89, _db_hits=267)
SimplePatternMatcher(g="(c1)-['relall']-(c2)", _rows=89, _db_hits=2166150)
NodeByLabel(identifier="ctxname", _db_hits=0, _rows=44100, label="Context", 
identifiers=["ctxname"], producer="NodeByLabel")

Distinct(_rows=84, _db_hits=0)
Filter(pred="Property(ctx,uid(1)) == Property(rel,context(7))", _rows=89, _db_hits=93450)
        NodeByLabel(identifier="ctx", _db_hits=0, _rows=46725, label="Context",
 identifiers=["ctx"], producer="NodeByLabel")
          Filter(pred="hasLabel(c2:Concept(1))", _rows=89, _db_hits=0)
            TraversalMatcher(start={"label": "Concept", "producer": "NodeByLabel",      
"identifiers": ["c1"]}, trail="(c1)-[rel:TO WHERE hasLabel(NodeIdentifier():Concept(1)) 
AND Property(RelationshipIdentifier(),user(8)) == Literal(9d6e7140-f3c3-11e3-927f-
1f5ca4210ac7)]->(c2)", _rows=89, _db_hits=127572)

感谢您提供的任何帮助,或者至少如果您可以告诉我此查询的弱点在上面的个人资料信息中判断...

2 个答案:

答案 0 :(得分:1)

你的关系是一个“泛滥”,应该是一个节点,你从过去的讨论中知道这一点:)

由于您没有针对起点的索引查找,因此此查询必须扫描整个图表。

为字段用户启用relationship-auto-index,并使用关系查找启动此查询。

你的上下文也匹配它找到的每个关系,不确定你是否期望匹配多个上下文?

还要确保索引为:Context(uid)

START rel = relationship:relationship_auto_index(user='9d6e7140-f3c3-11e3-927f-1f5ca4210ac7')
WHERE type(rel) = "TO"
WITH rel, startNode(rel) as c1, endNode(rel) as c2
WHERE (c1:Concept) AND (c2:Concept)
MATCH (ctx:Context)
WHERE ctx.uid = rel.context
WITH DISTINCT c1, c2 
MATCH c1-[relall:TO]->c2 
WHERE (relall.user='9d6e7140-f3c3-11e3-927f-1f5ca4210ac7') 
MATCH (ctxname:Context)
WHERE ctxname.uid = relall.context
RETURN DISTINCT 
c1.uid AS source_id, 
c1.name AS source_name, 
c2.uid AS target_id, 
c2.name AS target_name, 
relall.uid AS edge_id, 
ctxname.name AS context_name, 
relall.statement AS statement_id, 
relall.weight AS weight;

答案 1 :(得分:1)

首先,我建议尽可能多地提供信息。 Cypher规划师可以更有效地使用内联匹配。 (也就是说,它为如何找到项目提供了更多功能,因为关系更明确)

其次,较少的匹配更好,因为规划者可以更好地计划不触摸节点。

之后,只有索引才有助于提高性能。也就是说,在TO.user和Context.uid上(使用这些索引,这应该只是几个后端快速数据库提取)

这是您的相同查询,但where ... and ...语句已转换为内联匹配。添加了评论,但您还应删除我上次评论之上的所有内容,因为这会浪费计算工作量,这会使Cypher Planner混淆(就此示例查询而言)

MATCH (c1:Concept)-[rel:TO{user:'9d6e7140-f3c3-11e3-927f-1f5ca4210ac7'}]->(c2:Concept), (ctx:Context{uid:rel.context})
// Wait, why did we match ctx then?
WITH DISTINCT c1, c2
// We just did this... This match makes everything above it redundant...
MATCH (c1:Concept)-[relall:TO{user:'9d6e7140-f3c3-11e3-927f-1f5ca4210ac7'}]->(c2:Concept), (ctxname:Context{uid:relall.context})
RETURN DISTINCT 
c1.uid AS source_id, 
c1.name AS source_name, 
c2.uid AS target_id, 
c2.name AS target_name, 
relall.uid AS edge_id, 
ctxname.name AS context_name, 
relall.statement AS statement_id, 
relall.weight AS weight;