在长时间的比赛中提高Neo4j Cypher的性能

时间:2013-11-22 20:39:46

标签: neo4j cypher

设定:

  • Neo4j - 1.9.3
  • ~7,000个节点
  • ~180万人关系

我有以下cypher查询,我想提高性能:

START a=node(2) MATCH (a)-[:knowledge]-(x)-[:depends]-(y)-[:knowledge]-(end) RETURN COUNT(DISTINCT end);

返回471(188171 ms)。

现在我只得到一个计数,但后来我可能想得到这些值(本例中为471)。问题是运行大约需要3-4分钟。

图表与许多关系密切相关。运行以下内容可显示节点a(2)中存在多少“知识”类型的边。

START a=node(2) MATCH (a)-[:knowledge]-(x) RETURN COUNT(a);

返回4350(103毫秒)。

对我来说,这似乎不是很多边缘要检查。我能否以某种方式将其拆分以提高性能?

编辑:根据评论,以下是使用个人资料运行查询的结果:

profile START a=node(2) MATCH (a)-[:knowledge]-(x)-[:depends]-(y)-[:knowledge]-(end) RETURN COUNT(DISTINCT end);
==> +---------------------+
==> | COUNT(DISTINCT end) |
==> +---------------------+
==> | 471                 |
==> +---------------------+
==> 1 row
==> 
==> ColumnFilter(symKeys=["  INTERNAL_AGGREGATEcd2aff18-1c9d-47a8-9217-588cb86bbc1a"], returnItemNames=["COUNT(DISTINCT end)"], _rows=1, _db_hits=0)
==> EagerAggregation(keys=[], aggregates=["(  INTERNAL_AGGREGATEcd2aff18-1c9d-47a8-9217-588cb86bbc1a,Distinct)"], _rows=1, _db_hits=0)
==>   TraversalMatcher(trail="(a)-[  UNNAMED7:knowledge WHERE true AND true]-(x)-[  UNNAMED8:depends WHERE true AND true]-(y)-[  UNNAMED9:knowledge WHERE true AND true]-(end)", _rows=25638262, _db_hits=25679365)
==>     ParameterPipe(_rows=1, _db_hits=0)

1 个答案:

答案 0 :(得分:2)

我最终做了以下工作以提高性能:

profile START a=node(2) MATCH (a)-[:knowledge]-(x) WITH DISTINCT x MATCH (x)-[:depends]-(y) WITH DISTINCT y MATCH (y)-[:knowledge]-(end) WITH DISTINCT end RETURN COUNT(end);
==> +------------+
==> | COUNT(end) |
==> +------------+
==> | 471        |
==> +------------+
==> 1 row
==> 
==> ColumnFilter(symKeys=["  INTERNAL_AGGREGATE1967576a-d357-457a-b799-adbb16b93048"], returnItemNames=["COUNT(end)"], _rows=1, _db_hits=0)
==> EagerAggregation(keys=[], aggregates=["(  INTERNAL_AGGREGATE1967576a-d357-457a-b799-adbb16b93048,Count)"], _rows=1, _db_hits=0)
==>   Distinct(_rows=471, _db_hits=0)
==>     PatternMatch(g="(end)-['  UNNAMED3']-(y)", _rows=403437, _db_hits=0)
==>       Distinct(_rows=735, _db_hits=0)
==>         PatternMatch(g="(x)-['  UNNAMED2']-(y)", _rows=1653, _db_hits=0)
==>           Distinct(_rows=177, _db_hits=0)
==>             TraversalMatcher(trail="(a)-[  UNNAMED1:knowledge WHERE true AND true]-(x)", _rows=4350, _db_hits=4351)
==>               ParameterPipe(_rows=1, _db_hits=0)

通过使每个步骤在整体中占很小的一部分,它降低了整体复杂性,并且只跟随匹配的边缘。