我想了解为什么查询分析器仅显示2000003
db命中。毕竟,查询需要节点上的fullscan。
我的问题是关于以下查询:
WITH ["Jennifer","Michelle","Tanya","Julie","Christie","Sophie","Amanda","Khloe","Sarah","Kaylee"] AS names
FOREACH (r IN range(0,1000000) | CREATE (:LabelA {username:names[r % size(names)]+r}))
WITH ["Jennifer","Michelle","Tanya","Julie","Christie","Sophie","Amanda","Khloe","Sarah","Kaylee"] AS names
FOREACH (r IN range(0,1000000) | CREATE (:LabelA:LabelB {username:names[r % size(names)]+r}))
WITH ["Jennifer","Michelle","Tanya","Julie","Christie","Sophie","Amanda","Khloe","Sarah","Kaylee"] AS names
FOREACH (r IN range(0,1000000) | CREATE (:LabelB {username:names[r % size(names)]+r}))
MATCH (n:LabelA:LabelB) RETURN COUNT(n)
它创建3000003
个节点。因此,为了计算具有特定标签的节点数量,我们应该进行全面扫描,因此3000003
db命中。但是,配置文件显示,在第一步中,它需要2000003
db命中。怎么可能?
借鉴于:
https://maxdemarzi.com/2017/10/25/counting-nodes-with-multiple-labels/
答案 0 :(得分:0)
标签是自动编入索引的。这意味着,对于标签LA和LB,Neo4j内部具有LA标签的所有节点的列表,以及具有LB标签的所有节点的另一列表。
因此全扫描只需要扫描这两个索引。实际上,它只需要扫描其中一个,并检查每个节点的第二个标签。
此外,'db hit'实际上并不是数据库读取。它实际上是一个抽象的“数据库时间单位”。因此,将其视为使用的系统IO时间的通用度量。 (这在具有复杂过滤器的复杂查询中更为显着。类似于子字符串匹配,或者找到值为'x'的任何node.property)(https://neo4j.com/docs/developer-manual/current/cypher/execution-plans/#execution-plans-dbhits)