使用count()运行时,为什么我的密码查询需要花费10倍的时间?

时间:2016-02-29 23:28:34

标签: neo4j cypher

我从以下查询开始:

PROFILE
MATCH Base = (SBase:Snapshot {timestamp:1454983481.304583})-[:contains]->()
MATCH Prime = (:Snapshot {timestamp:1454983521.642284})-[PContains:contains]->(SPrimePackage)
WHERE NOT (SBase)-[:contains]->(SPrimePackage)
RETURN PContains
LIMIT 10

我得到“在119毫秒内总共5834次点击”。该图正确显示了9个节点,以及连接它们的8个边。然后我运行一个几乎相同的查询,除了我改为返回count(distinct()):

PROFILE
MATCH Base = (SBase:Snapshot {timestamp:1454983481.304583})-[:contains]->()
MATCH Prime = (:Snapshot {timestamp:1454983521.642284})-[PContains:contains]->(SPrimePackage)
WHERE NOT (SBase)-[:contains]->(SPrimePackage)
RETURN count(distinct(SPrimePackage))
LIMIT 10

这给出了“1771毫秒内1382270总db命中率”。结果是正确的:8。然而,为什么计数(distinct())这么慢和更昂贵?我应该以其他方式这样做吗?

我正在运行Neo4j 2.3.1

编辑1

为了确保我将苹果与苹果进行比较,并突出显示问题,这里有一对类似的查询和结果:

MATCH Base = (SBase:Snapshot {timestamp:1454983481.304583})-[:contains]->()
MATCH Prime = (:Snapshot {timestamp:1454983521.642284})-[PContains:contains]->(SPrimePackage)
WHERE NOT (SBase)-[:contains]->(SPrimePackage)
RETURN SPrimePackage
LIMIT 10

注意它在原版中返回“SPrimePackage”而不是“PContains”。结果是“在740毫秒内总共5834次点击”。

以下是与“count()”完全相同的查询:

MATCH Base = (SBase:Snapshot {timestamp:1454983481.304583})-[:contains]->()
MATCH Prime = (:Snapshot {timestamp:1454983521.642284})-[PContains:contains]->(SPrimePackage)
WHERE NOT (SBase)-[:contains]->(SPrimePackage)
RETURN count(SPrimePackage)
LIMIT 10

结果:“在2731毫秒内1382270总db命中率”。请注意的区别是“count()”。直觉上,我希望“count()”能够添加一个统计步骤,但显然它的功能远不止于此。为什么“count()”会触发所有这些额外的工作?

1 个答案:

答案 0 :(得分:1)

[增订]

如果您比较了2个(已编辑)查询的PROFILE输出,您可能会发现唯一重要的区别是COUNT()版本中存在EagerAggregation操作的查询。聚合函数使用EagerAggregation在内存中收集在实际执行聚合函数之前聚合的所有数据(在本例中为COUNT())。这需要在不使用聚合函数时不需要的额外工作。

以下查询仍然使用COUNT()来获取计数,但大大减少了必须聚合的数据,从而减少了需要在EagerAggregation步骤中完成的工作量:

PROFILE
MATCH (SBase:Snapshot { timestamp:1454983481.304583 })
USING INDEX SBase:Snapshot(timestamp)
WHERE (SBase)-[:contains]->()
MATCH (s:Snapshot { timestamp:1454983521.642284 })-[:contains]->(SPrimePackage)
USING INDEX s:Snapshot(timestamp)
WHERE NOT (SBase)-[:contains]->(SPrimePackage)
RETURN COUNT(DISTINCT SPrimePackage)
LIMIT 10;

上述查询假设您已在:Snapshot(timestamp)上创建了索引,以大大加快搜索2 :Snapshot个节点的速度:

CREATE INDEX ON :Snapshot(timestamp);

使用一些简单的数据,我得到的个人资料是:

+-------------------+----------------+------+---------+--------------------------------------+--------------------------------------+
| Operator          | Estimated Rows | Rows | DB Hits | Variables                            | Other                                |
+-------------------+----------------+------+---------+--------------------------------------+--------------------------------------+
| +ProduceResults   |              1 |    1 |       0 | COUNT(DISTINCT SPrimePackage)        | COUNT(DISTINCT SPrimePackage)        |
| |                 +----------------+------+---------+--------------------------------------+--------------------------------------+
| +Limit            |              1 |    1 |       0 | COUNT(DISTINCT SPrimePackage)        | Literal(10)                          |
| |                 +----------------+------+---------+--------------------------------------+--------------------------------------+
| +EagerAggregation |              1 |    1 |       0 | COUNT(DISTINCT SPrimePackage)        |                                      |
| |                 +----------------+------+---------+--------------------------------------+--------------------------------------+
| +AntiSemiApply    |              1 |    7 |       0 | anon[180], s -- SBase, SPrimePackage |                                      |
| |\                +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +Expand(Into)   |              1 |    0 |      34 | anon[266] -- SBase, SPrimePackage    | (SBase)-[:contains]->(SPrimePackage) |
| | |               +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +Argument       |              4 |    8 |       0 | SBase, SPrimePackage                 |                                      |
| |                 +----------------+------+---------+--------------------------------------+--------------------------------------+
| +CartesianProduct |              4 |    8 |       0 | SBase -- anon[180], SPrimePackage, s |                                      |
| |\                +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +Expand(All)    |              4 |    8 |      10 | anon[180], SPrimePackage -- s        | (s)-[:contains]->(SPrimePackage)     |
| | |               +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +NodeIndexSeek  |              2 |    2 |       4 | s                                    | :Snapshot(timestamp)                 |
| |                 +----------------+------+---------+--------------------------------------+--------------------------------------+
| +SemiApply        |              1 |    2 |       0 | SBase                                |                                      |
| |\                +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +Expand(All)    |              4 |    0 |       2 | anon[112], anon[126] -- SBase        | (SBase)-[:contains]->()              |
| | |               +----------------+------+---------+--------------------------------------+--------------------------------------+
| | +Argument       |              2 |    2 |       0 | SBase                                |                                      |
| |                 +----------------+------+---------+--------------------------------------+--------------------------------------+
| +NodeIndexSeek    |              2 |    2 |       3 | SBase                                | :Snapshot(timestamp)                 |
+-------------------+----------------+------+---------+--------------------------------------+--------------------------------------+

除了使用索引之外,还有以上查询:

  1. 无需查找SBase所包含的所有节点,因为我们只需查找一个包含的节点即可识别匹配的SBase节点。只要找到一个SemiApply匹配项,(SBase)-[:contains]->()操作就会完成,因此第一个MATCH子句将导致每行SBase而不是N行的单行。 根据你问题中的信息,我怀疑N会是8。
  2. 笛卡尔积很快,因为产品的两条“腿”都应该具有较低的基数。