基于成本的Cypher查询的优化

时间:2016-05-02 08:58:04

标签: graph neo4j cypher

对于以下关系,节点之间存在成本。

CREATE (n:STATION {name: 'BA'});
CREATE (n:STATION {name: 'BB'});
CREATE (n:STATION {name: 'BC'});
CREATE (n:STATION {name: 'BD'});

MATCH (a:STATION {name: 'BA'}),
      (b:STATION {name: 'BB'})
MERGE (a)-[:ROUTE {route: 1, cost: 10}]->(b)
MERGE (b)-[:ROUTE {route: 1, cost: 10}]->(a);

MATCH (a:STATION {name: 'BA'}),
      (b:STATION {name: 'BC'})
MERGE (a)-[:ROUTE {route: 2, cost: 3}]->(b)
MERGE (a)-[:ROUTE {route: 3, cost: 4}]->(b)
MERGE (b)-[:ROUTE {route: 2, cost: 3}]->(a)
MERGE (b)-[:ROUTE {route: 3, cost: 4}]->(a);

MATCH (a:STATION {name: 'BC'}),
      (b:STATION {name: 'BB'})
MERGE (a)-[:ROUTE {route: 2, cost: 2}]->(b)
MERGE (a)-[:ROUTE {route: 3, cost: 3}]->(b)
MERGE (b)-[:ROUTE {route: 2, cost: 2}]->(a)
MERGE (b)-[:ROUTE {route: 3, cost: 3}]->(a);

MATCH (a:STATION {name: 'BD'}),
      (b:STATION {name: 'BB'})
MERGE (a)-[:ROUTE {route: 4, cost: 2}]->(b)
MERGE (b)-[:ROUTE {route: 4, cost: 2}]->(a);

enter image description here

当我使用[* .. 10]查询时,它可以返回正确的结果。但它很慢,因为它需要寻找许多可能性。

MATCH p=((a:STATION {name: 'BA'})-[*..10]->(b:STATION {name: 'BB'}))
WHERE NONE (n IN nodes(p) 
            WHERE size(filter(x IN nodes(p) 
                              WHERE n = x))> 1) 
WITH reduce(acc=[], r in rels(p) | 
  CASE
    WHEN size(acc) > 0 and last(acc) = r.route THEN acc 
    ELSE acc + r.route
  END) as reducedRoutes,
reduce(cost=0, r in rels(p) | cost + r.cost) as routecost
WHERE NONE (n IN reducedRoutes 
            WHERE size(filter(x IN reducedRoutes 
                              WHERE n = x))> 1) 
RETURN reducedRoutes, routecost, size(reducedRoutes) as len
ORDER BY routecost ASC, len ASC

结果:

╒═════════════╤═════════╤═══╕
│reducedRoutes│routecost│len│
╞═════════════╪═════════╪═══╡
│[2]          │5        │1  │
├─────────────┼─────────┼───┤
│[3, 2]       │6        │2  │
├─────────────┼─────────┼───┤
│[2, 3]       │6        │2  │
├─────────────┼─────────┼───┤
│[3]          │7        │1  │
├─────────────┼─────────┼───┤
│[1]          │10       │1  │
└─────────────┴─────────┴───┘

当我使用allshortestpaths查询时,它返回错误的结果,因为我不期望这种最短的路径。

MATCH p=allshortestpaths((a:STATION {name: 'BA'})-[*]->(b:STATION {name: 'BB'}))
WHERE NONE (n IN nodes(p) 
            WHERE size(filter(x IN nodes(p) 
                              WHERE n = x))> 1) 
WITH reduce(acc=[], r in rels(p) | 
  CASE
    WHEN size(acc) > 0 and last(acc) = r.route THEN acc 
    ELSE acc + r.route
  END) as reducedRoutes,
reduce(cost=0, r in rels(p) | cost + r.cost) as routecost
WHERE NONE (n IN reducedRoutes 
            WHERE size(filter(x IN reducedRoutes 
                              WHERE n = x))> 1) 
RETURN reducedRoutes, routecost, size(reducedRoutes) as len
ORDER BY routecost ASC, len ASC

结果:

╒═════════════╤═════════╤═══╕
│reducedRoutes│routecost│len│
╞═════════════╪═════════╪═══╡
│[1]          │10       │1  │
└─────────────┴─────────┴───┘

我想问一下,有没有办法在Cypher中用更好的性能执行基于成本的搜索?

此外,除了Neo4j之外还有其他更好的解决方案吗?

1 个答案:

答案 0 :(得分:0)

我们在这种情况下使用伪意见(rank可以调整为数据量):

UNWIND RANGE(0,3) AS rank
  MATCH (from: STATION {name:"BA"})
  MATCH (to: STATION {name:"BB"})
  WITH from, to, rank
    MATCH path = allShortestPaths( (from)-[:ROUTE*]->(to) )
      WHERE LENGTH(path)>rank
    UNWIND NODES(path) AS rn
    WITH rank, path, COLLECT(DISTINCT rn) AS rns
      WHERE SIZE(rns)=SIZE(NODES(path))
RETURN DISTINCT rank, 
       path, 
       REDUCE(c = 0, r IN RELATIONSHIPS(path) | c+r.cost) AS routecost
ORDER BY routecost