我观察到这两个AQL语句之间存在巨大的运行时差异,数据库设置有大约20 Mio记录:
FOR e IN EAll
FILTER e.lastname == "Kmp" // <-- skip-index
FILTER e.lastpaff != "" // <-- no index
RETURN e
// runs in less than a second
和
FOR e IN EAll
FILTER e.lastpaff != "" // <-- no index
FILTER e.lastname == "Kmp" // <-- skip-index
RETURN e
// needs about a minute to execute.
除了(或没有)索引之外,这些语句的选择性也大不相同:indexedAttribute在其中具有高度选择性 - 因为nonIndexedAttribute只过滤50%。
是否有可能还没有优化规则?我目前正在使用ArangoDB 2.4.0。
详情:
索引属性上有一个SKIP-Index(似乎在执行计划1中使用)。 以下是执行计划,其中仅更改了过滤器的顺序:
FAST QUERY:
arangosh [Uni]> stmt.explain()
{
"plan" : {
"nodes" : [
{
"type" : "SingletonNode",
"dependencies" : [ ],
"id" : 1,
"estimatedCost" : 1,
"estimatedNrItems" : 1
},
{
"type" : "IndexRangeNode",
"dependencies" : [
1
],
"id" : 8,
"estimatedCost" : 170463.32,
"estimatedNrItems" : 170462,
"database" : "Uni",
"collection" : "EAll",
"outVariable" : {
"id" : 0,
"name" : "i"
},
"ranges" : [
[
{
"variable" : "i",
"attr" : "lastname",
"lowConst" : {
"bound" : "Kmp",
"include" : true,
"isConstant" : true
},
"highConst" : {
"bound" : "Kmp",
"include" : true,
"isConstant" : true
},
"lows" : [ ],
"highs" : [ ],
"valid" : true,
"equality" : true
}
]
],
"index" : {
"type" : "skiplist",
"id" : "13295598550318",
"unique" : false,
"fields" : [
"lastname"
]
},
"reverse" : false
},
{
"type" : "CalculationNode",
"dependencies" : [
8
],
"id" : 5,
"estimatedCost" : 340925.32,
"estimatedNrItems" : 170462,
"expression" : {
"type" : "compare !=",
"subNodes" : [
{
"type" : "attribute access",
"name" : "lastpaff",
"subNodes" : [
{
"type" : "reference",
"name" : "i",
"id" : 0
}
]
},
{
"type" : "value",
"value" : ""
}
]
},
"outVariable" : {
"id" : 2,
"name" : "2"
},
"canThrow" : false
},
{
"type" : "FilterNode",
"dependencies" : [
5
],
"id" : 6,
"estimatedCost" : 511387.32,
"estimatedNrItems" : 170462,
"inVariable" : {
"id" : 2,
"name" : "2"
}
},
{
"type" : "ReturnNode",
"dependencies" : [
6
],
"id" : 7,
"estimatedCost" : 681849.3200000001,
"estimatedNrItems" : 170462,
"inVariable" : {
"id" : 0,
"name" : "i"
}
}
],
"rules" : [
"move-calculations-up",
"move-filters-up",
"move-calculations-up-2",
"move-filters-up-2",
"use-index-range",
"remove-filter-covered-by-index"
],
"collections" : [
{
"name" : "EAll",
"type" : "read"
}
],
"variables" : [
{
"id" : 0,
"name" : "i"
},
{
"id" : 1,
"name" : "1"
},
{
"id" : 2,
"name" : "2"
}
],
"estimatedCost" : 681849.3200000001,
"estimatedNrItems" : 170462
},
"warnings" : [ ],
"stats" : {
"rulesExecuted" : 19,
"rulesSkipped" : 0,
"plansCreated" : 1
}
}
SLOW Query:
arangosh [Uni]> stmt.explain()
{
"plan" : {
"nodes" : [
{
"type" : "SingletonNode",
"dependencies" : [ ],
"id" : 1,
"estimatedCost" : 1,
"estimatedNrItems" : 1
},
{
"type" : "EnumerateCollectionNode",
"dependencies" : [
1
],
"id" : 2,
"estimatedCost" : 17046233,
"estimatedNrItems" : 17046232,
"database" : "Uni",
"collection" : "EAll",
"outVariable" : {
"id" : 0,
"name" : "i"
},
"random" : false
},
{
"type" : "CalculationNode",
"dependencies" : [
2
],
"id" : 3,
"estimatedCost" : 34092465,
"estimatedNrItems" : 17046232,
"expression" : {
"type" : "compare !=",
"subNodes" : [
{
"type" : "attribute access",
"name" : "lastpaff",
"subNodes" : [
{
"type" : "reference",
"name" : "i",
"id" : 0
}
]
},
{
"type" : "value",
"value" : ""
}
]
},
"outVariable" : {
"id" : 1,
"name" : "1"
},
"canThrow" : false
},
{
"type" : "FilterNode",
"dependencies" : [
3
],
"id" : 4,
"estimatedCost" : 51138697,
"estimatedNrItems" : 17046232,
"inVariable" : {
"id" : 1,
"name" : "1"
}
},
{
"type" : "CalculationNode",
"dependencies" : [
4
],
"id" : 5,
"estimatedCost" : 68184929,
"estimatedNrItems" : 17046232,
"expression" : {
"type" : "compare ==",
"subNodes" : [
{
"type" : "attribute access",
"name" : "lastname",
"subNodes" : [
{
"type" : "reference",
"name" : "i",
"id" : 0
}
]
},
{
"type" : "value",
"value" : "Kmp"
}
]
},
"outVariable" : {
"id" : 2,
"name" : "2"
},
"canThrow" : false
},
{
"type" : "FilterNode",
"dependencies" : [
5
],
"id" : 6,
"estimatedCost" : 85231161,
"estimatedNrItems" : 17046232,
"inVariable" : {
"id" : 2,
"name" : "2"
}
},
{
"type" : "ReturnNode",
"dependencies" : [
6
],
"id" : 7,
"estimatedCost" : 102277393,
"estimatedNrItems" : 17046232,
"inVariable" : {
"id" : 0,
"name" : "i"
}
}
],
"rules" : [
"move-calculations-up",
"move-filters-up",
"move-calculations-up-2",
"move-filters-up-2"
],
"collections" : [
{
"name" : "EAll",
"type" : "read"
}
],
"variables" : [
{
"id" : 0,
"name" : "i"
},
{
"id" : 1,
"name" : "1"
},
{
"id" : 2,
"name" : "2"
}
],
"estimatedCost" : 102277393,
"estimatedNrItems" : 17046232
},
"warnings" : [ ],
"stats" : {
"rulesExecuted" : 19,
"rulesSkipped" : 0,
"plansCreated" : 1
}
}
答案 0 :(得分:1)
实际上,即使可以使用索引,以下条件也禁用了索引的使用:
FILTER doc.indexedAttribute != ... FILTER doc.indexedAttribute == ...
有趣的是,当两个条件置于相同的FILTER
条件并与&&
结合使用时,会使用索引:
FILTER doc.indexedAttribute != ... && doc.indexedAttribute == ...
虽然这两个语句是等价的,但它们会触发略微不同的代码路径。前者将AND组合两个现有FILTER
范围,后者将产生一个范围FILTER
。 FILTER
范围的AND组合的情况过于防御,即使只有一方(在这种情况下是具有非等式运算符的一方)不能用于索引扫描,也会被双方拒绝。 / p>
这已在2.4中修复,修复程序将包含在2.4.2中。现在的解决方法是将两个FILTER
语句组合在一个语句中。