Neo4j cypher查询改进(性能)

时间:2017-02-06 13:51:17

标签: performance neo4j cypher

我有以下密码查询:

CALL apoc.index.nodes('node_auto_index','pref_label:(Foo)')
YIELD node, weight 
WHERE node.corpus = 'my_corpus'
WITH node, weight 
MATCH (selected:ontoterm{corpus:'my_corpus'})-[:spotted_in]->(:WEBSITE)<-[:spotted_in]-(node:ontoterm{corpus:'my_corpus'}) 
WHERE selected.uri = 'http://uri1' 
      OR selected.uri = 'http://uri2' 
      OR selected.uri = 'http://uri3' 
RETURN DISTINCT node, weight 
ORDER BY weight DESC LIMIT 10

第一部分(直到WITH)运行速度非常快(Lucene遗留索引)并返回~100个节点。 uri属性也是唯一的(选择= 3个节点) 我有~300个WEBSITE节点。执行时间为48749毫秒。

资料: enter image description here

如何重构查询以提高性能?为什么配置文件中有~13.8 Mio行?

2 个答案:

答案 0 :(得分:1)

我认为问题出在WITH子句中,这扩大了结果的范围。 InverseFalcon的回答使查询更快:49 - &gt; 18秒(但仍然不够快)。为了避免巨大的扩展我收集了网站。以下查询需要60毫秒

MATCH (selected:ontoterm)-[:spotted_in]->(w:WEBSITE)
WHERE selected.uri in ['http://avgl.net/carbon_terms/Faser', 'http://avgl.net/carbon_terms/Carbon', 'http://avgl.net/carbon_terms/Leichtbau']
AND selected.corpus = 'carbon_terms'
with collect(distinct(w)) as websites
CALL apoc.index.nodes('node_auto_index','pref_label:(Fas OR Fas*)^10 OR pref_label_deco:(Fas OR Fas*)^3 OR alt_label:(Fa)^5') YIELD node, weight 
WHERE node.corpus = 'carbon_terms' AND node:ontoterm 
WITH websites, node, weight
match (node)-[:spotted_in]->(w:WEBSITE)
where w in websites
return node, weight
ORDER BY weight  DESC
LIMIT 10

答案 1 :(得分:0)

我没有在您的计划中看到任何NodeUniqueIndexSeek,因此selected节点无法有效查找。

确保您对ontoterm(uri)有一个独特的约束。

在唯一约束启动后,尝试一下:

PROFILE CALL apoc.index.nodes('node_auto_index','pref_label:(Foo)')
YIELD node, weight 
WHERE node.corpus = 'my_corpus' AND node:ontoterm
WITH node, weight 
MATCH (selected:ontoterm)
WHERE selected.uri in ['http://uri1', 'http://uri2', 'http://uri3']
AND selected.corpus = 'my_corpus'
WITH node, weight, selected
MATCH (selected)-[:spotted_in]->(:WEBSITE)<-[:spotted_in]-(node) 
RETURN DISTINCT node, weight 
ORDER BY weight DESC LIMIT 10

查看查询计划。你应该在那里的某个地方看到一个NodeUniqueIndexSeek,希望你看到db命中率下降。