我从以下查询开始:
PROFILE
MATCH Base = (SBase:Snapshot {timestamp:1454983481.304583})-[:contains]->()
MATCH Prime = (:Snapshot {timestamp:1454983521.642284})-[PContains:contains]->(SPrimePackage)
WHERE NOT (SBase)-[:contains]->(SPrimePackage)
RETURN PContains
LIMIT 10
我得到“在119毫秒内总共5834次点击”。该图正确显示了9个节点,以及连接它们的8个边。然后我运行一个几乎相同的查询,除了我改为返回count(distinct()):
PROFILE
MATCH Base = (SBase:Snapshot {timestamp:1454983481.304583})-[:contains]->()
MATCH Prime = (:Snapshot {timestamp:1454983521.642284})-[PContains:contains]->(SPrimePackage)
WHERE NOT (SBase)-[:contains]->(SPrimePackage)
RETURN count(distinct(SPrimePackage))
LIMIT 10
这给出了“1771毫秒内1382270总db命中率”。结果是正确的:8。然而,为什么计数(distinct())这么慢和更昂贵?我应该以其他方式这样做吗?
我正在运行Neo4j 2.3.1
编辑1
为了确保我将苹果与苹果进行比较,并突出显示问题,这里有一对类似的查询和结果:
MATCH Base = (SBase:Snapshot {timestamp:1454983481.304583})-[:contains]->()
MATCH Prime = (:Snapshot {timestamp:1454983521.642284})-[PContains:contains]->(SPrimePackage)
WHERE NOT (SBase)-[:contains]->(SPrimePackage)
RETURN SPrimePackage
LIMIT 10
注意它在原版中返回“SPrimePackage”而不是“PContains”。结果是“在740毫秒内总共5834次点击”。
以下是与“count()”完全相同的查询:
MATCH Base = (SBase:Snapshot {timestamp:1454983481.304583})-[:contains]->()
MATCH Prime = (:Snapshot {timestamp:1454983521.642284})-[PContains:contains]->(SPrimePackage)
WHERE NOT (SBase)-[:contains]->(SPrimePackage)
RETURN count(SPrimePackage)
LIMIT 10
结果:“在2731毫秒内1382270总db命中率”。请注意仅的区别是“count()”。直觉上,我希望“count()”能够添加一个统计步骤,但显然它的功能远不止于此。为什么“count()”会触发所有这些额外的工作?
答案 0 :(得分:1)
[增订]
如果您比较了2个(已编辑)查询的PROFILE
输出,您可能会发现唯一重要的区别是COUNT()
版本中存在EagerAggregation操作的查询。聚合函数使用EagerAggregation
在内存中收集在实际执行聚合函数之前聚合的所有数据(在本例中为COUNT()
)。这需要在不使用聚合函数时不需要的额外工作。
以下查询仍然使用COUNT()
来获取计数,但大大减少了必须聚合的数据,从而减少了需要在EagerAggregation
步骤中完成的工作量:
PROFILE
MATCH (SBase:Snapshot { timestamp:1454983481.304583 })
USING INDEX SBase:Snapshot(timestamp)
WHERE (SBase)-[:contains]->()
MATCH (s:Snapshot { timestamp:1454983521.642284 })-[:contains]->(SPrimePackage)
USING INDEX s:Snapshot(timestamp)
WHERE NOT (SBase)-[:contains]->(SPrimePackage)
RETURN COUNT(DISTINCT SPrimePackage)
LIMIT 10;
上述查询假设您已在:Snapshot(timestamp)
上创建了索引,以大大加快搜索2 :Snapshot
个节点的速度:
CREATE INDEX ON :Snapshot(timestamp);
使用一些简单的数据,我得到的个人资料是:
+-------------------+----------------+------+---------+--------------------------------------+--------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+-------------------+----------------+------+---------+--------------------------------------+--------------------------------------+
| +ProduceResults | 1 | 1 | 0 | COUNT(DISTINCT SPrimePackage) | COUNT(DISTINCT SPrimePackage) |
| | +----------------+------+---------+--------------------------------------+--------------------------------------+
| +Limit | 1 | 1 | 0 | COUNT(DISTINCT SPrimePackage) | Literal(10) |
| | +----------------+------+---------+--------------------------------------+--------------------------------------+
| +EagerAggregation | 1 | 1 | 0 | COUNT(DISTINCT SPrimePackage) | |
| | +----------------+------+---------+--------------------------------------+--------------------------------------+
| +AntiSemiApply | 1 | 7 | 0 | anon[180], s -- SBase, SPrimePackage | |
| |\ +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +Expand(Into) | 1 | 0 | 34 | anon[266] -- SBase, SPrimePackage | (SBase)-[:contains]->(SPrimePackage) |
| | | +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +Argument | 4 | 8 | 0 | SBase, SPrimePackage | |
| | +----------------+------+---------+--------------------------------------+--------------------------------------+
| +CartesianProduct | 4 | 8 | 0 | SBase -- anon[180], SPrimePackage, s | |
| |\ +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +Expand(All) | 4 | 8 | 10 | anon[180], SPrimePackage -- s | (s)-[:contains]->(SPrimePackage) |
| | | +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +NodeIndexSeek | 2 | 2 | 4 | s | :Snapshot(timestamp) |
| | +----------------+------+---------+--------------------------------------+--------------------------------------+
| +SemiApply | 1 | 2 | 0 | SBase | |
| |\ +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +Expand(All) | 4 | 0 | 2 | anon[112], anon[126] -- SBase | (SBase)-[:contains]->() |
| | | +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +Argument | 2 | 2 | 0 | SBase | |
| | +----------------+------+---------+--------------------------------------+--------------------------------------+
| +NodeIndexSeek | 2 | 2 | 3 | SBase | :Snapshot(timestamp) |
+-------------------+----------------+------+---------+--------------------------------------+--------------------------------------+
除了使用索引之外,还有以上查询:
SBase
所包含的所有节点,因为我们只需查找一个包含的节点即可识别匹配的SBase
节点。只要找到一个SemiApply
匹配项,(SBase)-[:contains]->()
操作就会完成,因此第一个MATCH
子句将导致每行SBase
而不是N行的单行。 根据你问题中的信息,我怀疑N会是8。