索引属性的范围查询

时间:2017-01-27 15:42:28

标签: indexing neo4j range

查询一系列索引属性时,范围大小是否有最大限制?

为了澄清,我有一个以毫秒为单位的时间戳属性,它被编入索引,我正在尝试获取一个月内发生的所有事件。所以我有这样的查询

Match (e:Event)-[R:type{'has metadata'}]-> (S:EventMetaData) where e.type=~".*ELec.*" AND e.timestamp IN RANGE (1480550400000,1483228740000)  return S.Location, sum(e.value) as sumV  order by  sumV DESC

但我收到以下错误

Exception in thread "main" java.lang.OutOfMemoryError: Cannot index an collection of size 2678340001
at org.neo4j.cypher.internal.compiler.v3_2.commands.expressions.IndexedInclusiveLongRange.length(IndexedInclusiveLongRange.scala:51)
at scala.collection.SeqLike$class.size(SeqLike.scala:106)
at org.neo4j.cypher.internal.compiler.v3_2.commands.expressions.IndexedInclusiveLongRange.size(IndexedInclusiveLongRange.scala:30)
at scala.collection.mutable.Builder$class.sizeHint(Builder.scala:69)
at scala.collection.mutable.SetBuilder.sizeHint(SetBuilder.scala:20)
at scala.collection.TraversableLike$class.to(TraversableLike.scala:589)
at org.neo4j.cypher.internal.compiler.v3_2.commands.expressions.IndexedInclusiveLongRange.to(IndexedInclusiveLongRange.scala:30)
at scala.collection.TraversableOnce$class.toSet(TraversableOnce.scala:304)
at org.neo4j.cypher.internal.compiler.v3_2.commands.expressions.IndexedInclusiveLongRange.toSet(IndexedInclusiveLongRange.scala:30)
at org.neo4j.cypher.internal.compiler.v3_2.commands.indexQuery$.apply(indexQuery.scala:46)
at org.neo4j.cypher.internal.compiler.v3_2.pipes.NodeIndexSeekPipe.internalCreateResults(NodeIndexSeekPipe.scala:48)
at org.neo4j.cypher.internal.compiler.v3_2.pipes.Pipe$class.createResults(Pipe.scala:51)
at org.neo4j.cypher.internal.compiler.v3_2.pipes.NodeIndexSeekPipe.createResults(NodeIndexSeekPipe.scala:29)
at org.neo4j.cypher.internal.compiler.v3_2.pipes.PipeWithSource.createResults(Pipe.scala:79)
at org.neo4j.cypher.internal.compiler.v3_2.pipes.PipeWithSource.createResults(Pipe.scala:79)
at org.neo4j.cypher.internal.compiler.v3_2.pipes.PipeWithSource.createResults(Pipe.scala:79)
at org.neo4j.cypher.internal.compiler.v3_2.pipes.PipeWithSource.createResults(Pipe.scala:79)
at org.neo4j.cypher.internal.compiler.v3_2.pipes.PipeWithSource.createResults(Pipe.scala:79)
at org.neo4j.cypher.internal.compiler.v3_2.pipes.PipeWithSource.createResults(Pipe.scala:79)
at org.neo4j.cypher.internal.compiler.v3_2.pipes.PipeWithSource.createResults(Pipe.scala:79)
at org.neo4j.cypher.internal.compiler.v3_2.executionplan.DefaultExecutionResultBuilderFactory$ExecutionWorkflowBuilder.createResults(DefaultExecutionResultBuilderFactory.scala:95)
at org.neo4j.cypher.internal.compiler.v3_2.executionplan.DefaultExecutionResultBuilderFactory$ExecutionWorkflowBuilder.build(DefaultExecutionResultBuilderFactory.scala:73)
at org.neo4j.cypher.internal.compiler.v3_2.BuildInterpretedExecutionPlan$$anonfun$getExecutionPlanFunction$1.apply(BuildInterpretedExecutionPlan.scala:99)
at org.neo4j.cypher.internal.compiler.v3_2.BuildInterpretedExecutionPlan$$anonfun$getExecutionPlanFunction$1.apply(BuildInterpretedExecutionPlan.scala:83)
at org.neo4j.cypher.internal.compiler.v3_2.BuildInterpretedExecutionPlan$$anon$1.run(BuildInterpretedExecutionPlan.scala:54)
at org.neo4j.cypher.internal.compatibility.v3_2.Compatibility$ExecutionPlanWrapper$$anonfun$run$1.apply(Compatibility.scala:96)
at org.neo4j.cypher.internal.compatibility.v3_2.Compatibility$ExecutionPlanWrapper$$anonfun$run$1.apply(Compatibility.scala:94)
at org.neo4j.cypher.internal.compatibility.v3_2.exceptionHandler$runSafely$.apply(exceptionHandler.scala:84)
at org.neo4j.cypher.internal.compatibility.v3_2.Compatibility$ExecutionPlanWrapper.run(Compatibility.scala:94)

neo4j尝试将大小为endRange-startRange的集合分配为错误状态,这有点奇怪。我知道我可以通过以小时/天存储时间戳来解决这个问题,但我仍然很想知道为什么索引属性上的范围查询性能在neo4j中很慢并且是否有最大允许范围大小?

P.S。我增加了neo4j堆和页面缓存大小,但在索引属性上使用范围查询仍然具有较慢的性能

1 个答案:

答案 0 :(得分:2)

您尝试使用非常效率低下的技术(即使有效)来测试范围,因为RANGE函数已定义为生成N+1的集合值(其中N是范围的上限和下限之间的差异),IN操作将与集合中的每个项进行比较(在最坏的情况)。

您应该稍微更改您的查询,以便每行只进行2次数字比较:

MATCH (e:Event)-[R:type{'has metadata'}]-> (S:EventMetaData)
WHERE e.type=~".*ELec.*" AND 1480550400000 <= e.timestamp <= 1483228740000
RETURN S.Location, sum(e.value) AS sumV
ORDER BY  sumV DESC;