Neo4j - 为什么在完全扫描节点的情况下,探查器只显示2000003 db命中?

时间:2018-05-04 16:57:21

标签: neo4j cypher

我想了解为什么查询分析器仅显示2000003 db命中。毕竟,查询需要节点上的fullscan。

我的问题是关于以下查询:

WITH ["Jennifer","Michelle","Tanya","Julie","Christie","Sophie","Amanda","Khloe","Sarah","Kaylee"] AS names 
    FOREACH (r IN range(0,1000000) | CREATE (:LabelA {username:names[r % size(names)]+r}))

WITH ["Jennifer","Michelle","Tanya","Julie","Christie","Sophie","Amanda","Khloe","Sarah","Kaylee"] AS names 
    FOREACH (r IN range(0,1000000) | CREATE (:LabelA:LabelB {username:names[r % size(names)]+r}))

WITH ["Jennifer","Michelle","Tanya","Julie","Christie","Sophie","Amanda","Khloe","Sarah","Kaylee"] AS names 
    FOREACH (r IN range(0,1000000) | CREATE (:LabelB {username:names[r % size(names)]+r}))

MATCH (n:LabelA:LabelB) RETURN COUNT(n)

它创建3000003个节点。因此,为了计算具有特定标签的节点数量,我们应该进行全面扫描,因此3000003 db命中。但是,配置文件显示,在第一步中,它需要2000003 db命中。怎么可能?

借鉴于:

https://maxdemarzi.com/2017/10/25/counting-nodes-with-multiple-labels/

1 个答案:

答案 0 :(得分:0)

标签是自动编入索引的。这意味着,对于标签LA和LB,Neo4j内部具有LA标签的所有节点的列表,以及具有LB标签的所有节点的另一列表。

因此全扫描只需要扫描这两个索引。实际上,它只需要扫描其中一个,并检查每个节点的第二个标签。

此外,'db hit'实际上并不是数据库读取。它实际上是一个抽象的“数据库时间单位”。因此,将其视为使用的系统IO时间的通用度量。 (这在具有复杂过滤器的复杂查询中更为显着。类似于子字符串匹配,或者找到值为'x'的任何node.property)(https://neo4j.com/docs/developer-manual/current/cypher/execution-plans/#execution-plans-dbhits