总结所有属性Neo4j

时间:2016-11-23 17:36:20

标签: performance neo4j path

我在这里向你求助。 我有一个neo4j数据库,有完整的k-ary树,有数百万个节点。节点和边缘都具有恒定数量的属性(可能从节点到边缘不同,但所有节点都具有完全x属性,所有边缘都具有y属性)。 我的任务是返回节点和边缘上所有属性的总和。我试过这个问题:

Match p=(:Vertex_ss3 {name:'vertex_1594320'})-[:EDGE_ss3*]->(:Vertex_ss3 {name:'vertex_1'}) 
return 
reduce(sum = 0, n IN nodes(p) | sum + n.attr1) as tot_attr1_node,
reduce(sum = 0, n IN nodes(p) | sum + n.attr2) as tot_attr2_node, 
reduce(sum = 0, n IN nodes(p) | sum + n.attr3) as tot_attr3_node,
reduce(sum = 0, n IN nodes(p) | sum + n.attr4) as tot_attr4_node, 
reduce(sum = 0, n IN nodes(p) | sum + n.attr5) as tot_attr5_node,
reduce(sum = 0, n IN nodes(p) | sum + n.attr6) as tot_attr6_node,
reduce(sum = 0, n IN relationships(p) | sum + n.attr1) as tot_attr1_edge,
reduce(sum = 0, n IN relationships(p) | sum + n.attr2) as tot_attr2_edge,
reduce(sum = 0, n IN relationships(p) | sum + n.attr3) as tot_attr3_edge,
reduce(sum = 0, n IN relationships(p) | sum + n.attr4) as tot_attr4_edge,
reduce(sum = 0, n IN relationships(p) | sum + n.attr5) as tot_attr5_edge

使用深度为13的3-ary树返回大约需要13/14秒。有没有办法在时间上有所改进?

我真的不知道节点(p)和关系(p)是如何工作的,但是当我编写查询时,似乎对于每个属性,数据库必须从路径中检索所有节点或所有关系,isn'有没有办法一劳永逸地做到这一点?

感谢您的建议:)

2 个答案:

答案 0 :(得分:3)

您应该能够使用可以并行执行查询(具有不同数据)的APOC procedures之一。不幸的是,它们的记录很少(有些根本没有记录)。因此,我将提供一个关于其中一个程序的半教程,以及它如何帮助您获得更快的结果。

下面的查询是如何使用apoc.cypher.mapParallel并行地对每个节点属性求和(至少可能;该过程确定实际的并行度),然后并行地对每个关系属性求和。

MATCH p=(:Vertex_ss3 {name:'vertex_1594320'})-[:EDGE_ss3*]->(:Vertex_ss3 {name:'vertex_1'}) 
CALL apoc.cypher.mapParallel(
  'UNWIND nodes AS n RETURN _ AS attr, SUM(n[_]) AS sum',
  {nodes: NODES(p)},
  ['attr1','attr2','attr3','attr4','attr5','attr6']) YIELD value AS nodeAttr
WITH p, COLLECT(nodeAttr) AS nodeAttrs
CALL apoc.cypher.mapParallel(
  'UNWIND rels AS n RETURN _ AS attr, SUM(n[_]) AS sum',
  {rels: RELATIONSHIPS(p)},
  ['attr1','attr2','attr3','attr4','attr5']) YIELD value AS relAttr
RETURN nodeAttrs, COLLECT(relAttr) AS relAttrs;
  • 该过程的第一个参数是您要并行运行的Cypher查询。

  • 该过程的第二个参数定义传递给过程的Cypher查询的参数。对于每个参数,该过程创建一个具有相同名称的标识符(例如," {foo}"的值只能通过使用foo标识符来访问。)

    < / LI>
  • 该过程确保下划线(&#34; _&#34;)标识符将具有传递给过程的列表中的一个元素(最后一个参数)的值。

    < / LI>
  • nodeAttrsrelAttrs将是{attr:...,sum:...}地图的集合。

答案 1 :(得分:0)

不确定这是更快还是更慢,但您可以尝试展开节点和关系,然后使用SUM()函数获取所需的值。

MATCH p=(:Vertex_ss3 {name:'vertex_1594320'})-[:EDGE_ss3*]->(:Vertex_ss3 {name:'vertex_1'}) 
WITH p
UNWIND nodes(p) as n
WITH p, 
SUM(n.attr1) as tot_attr1_node
SUM(n.attr2) as tot_attr2_node
SUM(n.attr3) as tot_attr3_node
SUM(n.attr4) as tot_attr4_node
SUM(n.attr5) as tot_attr5_node
SUM(n.attr6) as tot_attr6_node
UNWIND relationships(p) as n
WITH p, tot_attr1_node, tot_attr2_node, tot_attr3_node, tot_attr4_node, tot_attr5_node, tot_attr6_node
SUM(n.attr1) as tot_attr1_edge
SUM(n.attr2) as tot_attr2_edge
SUM(n.attr3) as tot_attr3_edge
SUM(n.attr4) as tot_attr4_edge
SUM(n.attr5) as tot_attr5_edge
RETURN tot_attr1_node, 
tot_attr2_node, 
tot_attr3_node, 
tot_attr4_node, 
tot_attr5_node, 
tot_attr6_node, 
tot_attr1_edge, 
tot_attr2_edge, 
tot_attr3_edge, 
tot_attr4_edge, 
tot_attr5_edge