我的设置:
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
Neo4j 2.0.0-M06 Enterprise
首先,我确保通过执行以下操作来预热缓存:
START n=node(*) RETURN COUNT(n);
START r=relationship(*) RETURN count(r);
表的大小为63,677个节点和7,169,995个关系
现在我有以下查询:
START u1=node:node_auto_index('uid:39')
MATCH (u1:user)-[w:WANTS]->(c:card)<-[h:HAS]-(u2:user)
WHERE u2.uid <> 39
WITH u2.uid AS uid, (CASE WHEN w.qty < h.qty THEN w.qty ELSE h.qty END) AS have
RETURN uid, SUM(have) AS total
ORDER BY total DESC
SKIP 0
LIMIT 25
这个UID有大约40k +的结果,我希望能够分页。最初的跳过大约是773ms
。我尝试了第2页(跳过25)并且延迟大致相同,甚至达到了第500页,它只上升到900ms
所以我并没有真正打扰。现在我尝试了一些快进分页并跳过了数千,所以我做了1000,然后是2000,然后是3000.我希望ORDER BY安排已经被Neo4j缓存,使用SKIP
只会移动到那个索引结果并不必再次遍历每一个。但是对于每一千次跳过,我的延迟增加了很多。这不只是缓存变暖,因为对于一个我已经预热了缓存和两个,我尝试了相同的跳过每次跳过几次,它产生了相同的结果:
SKIP 0: 773ms
SKIP 1000: 1369ms
SKIP 2000: 2491ms
SKIP 3000: 3899ms
SKIP 4000: 5686ms
SKIP 5000: 7424ms
现在谁想要查看5000页的结果呢?甚至40k?! :) 好点子!我可能会对用户可以查看的最大结果设置上限,但我只是对这种现象感到好奇。有人请解释为什么Neo4j似乎正在重复通过看似已经知道的东西吗?
以下是我对0跳过的分析:
==> ColumnFilter(symKeys=["uid", " INTERNAL_AGGREGATE65c4d6a2-1930-4f32-8fd9-5e4399ce6f14"], returnItemNames=["uid", "total"], _rows=25, _db_hits=0)
==> Slice(skip="Literal(0)", _rows=25, _db_hits=0)
==> Top(orderBy=["SortItem(Cached( INTERNAL_AGGREGATE65c4d6a2-1930-4f32-8fd9-5e4399ce6f14 of type Any),false)"], limit="Add(Literal(0),Literal(25))", _rows=25, _db_hits=0)
==> EagerAggregation(keys=["uid"], aggregates=["( INTERNAL_AGGREGATE65c4d6a2-1930-4f32-8fd9-5e4399ce6f14,Sum(have))"], _rows=41659, _db_hits=0)
==> ColumnFilter(symKeys=["have", "u1", "uid", "c", "h", "w", "u2"], returnItemNames=["uid", "have"], _rows=146826, _db_hits=0)
==> Extract(symKeys=["u1", "c", "h", "w", "u2"], exprKeys=["uid", "have"], _rows=146826, _db_hits=587304)
==> Filter(pred="((NOT(Product(u2,uid(0),true) == Literal(39)) AND hasLabel(u1:user(0))) AND hasLabel(u2:user(0)))", _rows=146826, _db_hits=146826)
==> TraversalMatcher(trail="(u1)-[w:WANTS WHERE (hasLabel(NodeIdentifier():card(1)) AND hasLabel(NodeIdentifier():card(1))) AND true]->(c)<-[h:HAS WHERE (NOT(Product(NodeIdentifier(),uid(0),true) == Literal(39)) AND hasLabel(NodeIdentifier():user(0))) AND true]-(u2)", _rows=146826, _db_hits=293696)
对于5000跳过:
==> ColumnFilter(symKeys=["uid", " INTERNAL_AGGREGATE99329ea5-03cd-4d53-a6bc-3ad554b47872"], returnItemNames=["uid", "total"], _rows=25, _db_hits=0)
==> Slice(skip="Literal(5000)", _rows=25, _db_hits=0)
==> Top(orderBy=["SortItem(Cached( INTERNAL_AGGREGATE99329ea5-03cd-4d53-a6bc-3ad554b47872 of type Any),false)"], limit="Add(Literal(5000),Literal(25))", _rows=5025, _db_hits=0)
==> EagerAggregation(keys=["uid"], aggregates=["( INTERNAL_AGGREGATE99329ea5-03cd-4d53-a6bc-3ad554b47872,Sum(have))"], _rows=41659, _db_hits=0)
==> ColumnFilter(symKeys=["have", "u1", "uid", "c", "h", "w", "u2"], returnItemNames=["uid", "have"], _rows=146826, _db_hits=0)
==> Extract(symKeys=["u1", "c", "h", "w", "u2"], exprKeys=["uid", "have"], _rows=146826, _db_hits=587304)
==> Filter(pred="((NOT(Product(u2,uid(0),true) == Literal(39)) AND hasLabel(u1:user(0))) AND hasLabel(u2:user(0)))", _rows=146826, _db_hits=146826)
==> TraversalMatcher(trail="(u1)-[w:WANTS WHERE (hasLabel(NodeIdentifier():card(1)) AND hasLabel(NodeIdentifier():card(1))) AND true]->(c)<-[h:HAS WHERE (NOT(Product(NodeIdentifier(),uid(0),true) == Literal(39)) AND hasLabel(NodeIdentifier():user(0))) AND true]-(u2)", _rows=146826, _db_hits=293696)
唯一的区别是Top函数的LIMIT子句。我希望我们可以按预期工作,我真的不想深入研究嵌入式Neo4j +我自己的Web应用程序的Jetty REST API。
答案 0 :(得分:2)
结果不会被缓存,否则服务器内部的大量内存将保留很可能未被使用的结果。
正如您所说,人们对第一页或前两页感兴趣,然后改进搜索。
如果您需要具有更可预测的分页性能,请在第一个地方从neo中提取更多结果,将它们粘贴到您的用户会话中并从那里提供服务。您可以使用比数据库更多的上下文信息(例如用户行为配置文件或电源用户标志等)来执行此操作。