查询非索引属性FILTER的规则

时间:2015-01-21 11:38:01

标签: arangodb

我观察到这两个AQL语句之间存在巨大的运行时差异,数据库设置有大约20 Mio记录:

FOR e IN EAll
    FILTER e.lastname == "Kmp"  // <-- skip-index
    FILTER e.lastpaff != ""     // <-- no index
RETURN e
// runs in less than a second

FOR e IN EAll
    FILTER e.lastpaff != ""     // <-- no index
    FILTER e.lastname == "Kmp"  // <-- skip-index
RETURN e
// needs about a minute to execute.

除了(或没有)索引之外,这些语句的选择性也大不相同:indexedAttribute在其中具有高度选择性 - 因为nonIndexedAttribute只过滤50%。

是否有可能还没有优化规则?我目前正在使用ArangoDB 2.4.0。

详情:

索引属性上有一个SKIP-Index(似乎在执行计划1中使用)。 以下是执行计划,其中仅更改了过滤器的顺序:

    FAST QUERY:

    arangosh [Uni]> stmt.explain()
    {
      "plan" : {
        "nodes" : [
          {
            "type" : "SingletonNode",
            "dependencies" : [ ],
            "id" : 1,
            "estimatedCost" : 1,
            "estimatedNrItems" : 1
          },
          {
            "type" : "IndexRangeNode",
            "dependencies" : [
              1
            ],
            "id" : 8,
            "estimatedCost" : 170463.32,
            "estimatedNrItems" : 170462,
            "database" : "Uni",
            "collection" : "EAll",
            "outVariable" : {
              "id" : 0,
              "name" : "i"
            },
            "ranges" : [
              [
                {
                  "variable" : "i",
                  "attr" : "lastname",
                  "lowConst" : {
                    "bound" : "Kmp",
                    "include" : true,
                    "isConstant" : true
                  },
                  "highConst" : {
                    "bound" : "Kmp",
                    "include" : true,
                    "isConstant" : true
                  },
                  "lows" : [ ],
                  "highs" : [ ],
                  "valid" : true,
                  "equality" : true
                }
              ]
            ],
            "index" : {
              "type" : "skiplist",
              "id" : "13295598550318",
              "unique" : false,
              "fields" : [
                "lastname"
              ]
            },
            "reverse" : false
          },
          {
            "type" : "CalculationNode",
            "dependencies" : [
              8
            ],
            "id" : 5,
            "estimatedCost" : 340925.32,
            "estimatedNrItems" : 170462,
            "expression" : {
              "type" : "compare !=",
              "subNodes" : [
                {
                  "type" : "attribute access",
                  "name" : "lastpaff",
                  "subNodes" : [
                    {
                      "type" : "reference",
                      "name" : "i",
                      "id" : 0
                    }
                  ]
                },
                {
                  "type" : "value",
                  "value" : ""
                }
              ]
            },
            "outVariable" : {
              "id" : 2,
              "name" : "2"
            },
            "canThrow" : false
          },
          {
            "type" : "FilterNode",
            "dependencies" : [
              5
            ],
            "id" : 6,
            "estimatedCost" : 511387.32,
            "estimatedNrItems" : 170462,
            "inVariable" : {
              "id" : 2,
              "name" : "2"
            }
          },
          {
            "type" : "ReturnNode",
            "dependencies" : [
              6
            ],
            "id" : 7,
            "estimatedCost" : 681849.3200000001,
            "estimatedNrItems" : 170462,
            "inVariable" : {
              "id" : 0,
              "name" : "i"
            }
          }
        ],
        "rules" : [
          "move-calculations-up",
          "move-filters-up",
          "move-calculations-up-2",
          "move-filters-up-2",
          "use-index-range",
          "remove-filter-covered-by-index"
        ],
        "collections" : [
          {
            "name" : "EAll",
            "type" : "read"
          }
        ],
        "variables" : [
          {
            "id" : 0,
            "name" : "i"
          },
          {
            "id" : 1,
            "name" : "1"
          },
          {
            "id" : 2,
            "name" : "2"
          }
        ],
        "estimatedCost" : 681849.3200000001,
        "estimatedNrItems" : 170462
      },
      "warnings" : [ ],
      "stats" : {
        "rulesExecuted" : 19,
        "rulesSkipped" : 0,
        "plansCreated" : 1
      }
    }



    SLOW Query:

    arangosh [Uni]> stmt.explain()
    {
      "plan" : {
        "nodes" : [
          {
            "type" : "SingletonNode",
            "dependencies" : [ ],
            "id" : 1,
            "estimatedCost" : 1,
            "estimatedNrItems" : 1
          },
          {
            "type" : "EnumerateCollectionNode",
            "dependencies" : [
              1
            ],
            "id" : 2,
            "estimatedCost" : 17046233,
            "estimatedNrItems" : 17046232,
            "database" : "Uni",
            "collection" : "EAll",
            "outVariable" : {
              "id" : 0,
              "name" : "i"
            },
            "random" : false
          },
          {
            "type" : "CalculationNode",
            "dependencies" : [
              2
            ],
            "id" : 3,
            "estimatedCost" : 34092465,
            "estimatedNrItems" : 17046232,
            "expression" : {
              "type" : "compare !=",
              "subNodes" : [
                {
                  "type" : "attribute access",
                  "name" : "lastpaff",
                  "subNodes" : [
                    {
                      "type" : "reference",
                      "name" : "i",
                      "id" : 0
                    }
                  ]
                },
                {
                  "type" : "value",
                  "value" : ""
                }
              ]
            },
            "outVariable" : {
              "id" : 1,
              "name" : "1"
            },
            "canThrow" : false
          },
          {
            "type" : "FilterNode",
            "dependencies" : [
              3
            ],
            "id" : 4,
            "estimatedCost" : 51138697,
            "estimatedNrItems" : 17046232,
            "inVariable" : {
              "id" : 1,
              "name" : "1"
            }
          },
          {
            "type" : "CalculationNode",
            "dependencies" : [
              4
            ],
            "id" : 5,
            "estimatedCost" : 68184929,
            "estimatedNrItems" : 17046232,
            "expression" : {
              "type" : "compare ==",
              "subNodes" : [
                {
                  "type" : "attribute access",
                  "name" : "lastname",
                  "subNodes" : [
                    {
                      "type" : "reference",
                      "name" : "i",
                      "id" : 0
                    }
                  ]
                },
                {
                  "type" : "value",
                  "value" : "Kmp"
                }
              ]
            },
            "outVariable" : {
              "id" : 2,
              "name" : "2"
            },
            "canThrow" : false
          },
          {
            "type" : "FilterNode",
            "dependencies" : [
              5
            ],
            "id" : 6,
            "estimatedCost" : 85231161,
            "estimatedNrItems" : 17046232,
            "inVariable" : {
              "id" : 2,
              "name" : "2"
            }
          },
          {
            "type" : "ReturnNode",
            "dependencies" : [
              6
            ],
            "id" : 7,
            "estimatedCost" : 102277393,
            "estimatedNrItems" : 17046232,
            "inVariable" : {
              "id" : 0,
              "name" : "i"
            }
          }
        ],
        "rules" : [
          "move-calculations-up",
          "move-filters-up",
          "move-calculations-up-2",
          "move-filters-up-2"
        ],
        "collections" : [
          {
            "name" : "EAll",
            "type" : "read"
          }
        ],
        "variables" : [
          {
            "id" : 0,
            "name" : "i"
          },
          {
            "id" : 1,
            "name" : "1"
          },
          {
            "id" : 2,
            "name" : "2"
          }
        ],
        "estimatedCost" : 102277393,
        "estimatedNrItems" : 17046232
      },
      "warnings" : [ ],
      "stats" : {
        "rulesExecuted" : 19,
        "rulesSkipped" : 0,
        "plansCreated" : 1
      }
    }

1 个答案:

答案 0 :(得分:1)

实际上,即使可以使用索引,以下条件也禁用了索引的使用:

FILTER doc.indexedAttribute != ... FILTER doc.indexedAttribute == ...

有趣的是,当两个条件置于相同的FILTER条件并与&&结合使用时,会使用索引:

FILTER doc.indexedAttribute != ... && doc.indexedAttribute == ...

虽然这两个语句是等价的,但它们会触发略微不同的代码路径。前者将AND组合两个现有FILTER范围,后者将产生一个范围FILTERFILTER范围的AND组合的情况过于防御,即使只有一方(在这种情况下是具有非等式运算符的一方)不能用于索引扫描,也会被双方拒绝。 / p>

这已在2.4中修复,修复程序将包含在2.4.2中。现在的解决方法是将两个FILTER语句组合在一个语句中。