如何在ArangoDb中的下面的json文档中进行全文索引和搜索?

时间:2016-01-14 15:01:50

标签: arangodb arangodb-php arangojs

{
"batters":
    {
    "batter":[
            { "id": "1001", "type": "Regular" },
            { "id": "1002", "type": "Chocolate" },
            { "id": "1003", "type": "Blueberry" },
            { "id": "1004", "type": "Devil's Food" }
    ]
    },
    "topping":[
            { "id": "5001", "type": "None" },
            { "id": "5002", "type": "Glazed" },
            { "id": "5005", "type": "Sugar" },
            { "id": "5007", "type": "Powdered Sugar" },
            { "id": "5006", "type": "Chocolate with Sprinkles" },
            { "id": "5003", "type": "Chocolate" },
            { "id": "5004", "type": "Maple" }
     ]
}

基本上要在这里进行全文搜索,我需要对" batters.batter"进行索引。还有" batters.topping"即两个属性。如何处理这种全文搜索。请解释一下该方法,我将通过REST API实现我的搜索。提前感谢你。

1 个答案:

答案 0 :(得分:5)

解决此问题的最佳方法是稍微更改数据布局,因为全文索引只能在一个属性上工作,而请求索引两次不会很快。因此,我们使用匿名图将字符串连接到它们的对象。

因此,我们创建了两个(顶点)集合,一个边集合,一个带有fultext索引的顶点集合:

db._create("dishStrings")
db._createEdgeCollection("dishEdges")
db._create("dish")

db.dishStrings.ensureIndex({type: "fulltext", fields: [ "name" ]});

将文件保存给他们,并将关系绑在一起。 我们使用_key属性来引用_from_to边缘关系中的顶点:

db.dishStrings.save({"_key": "1001", "name": "Regular" , type: "Batter"});
db.dishStrings.save({"_key": "1002", "name": "Chocolate", type: "Batter" });
db.dishStrings.save({"_key": "1003", "name": "Blueberry", type: "Batter"});
db.dishStrings.save({"_key": "1004", "name": "Devil's Food", type: "Batter"});
db.dishStrings.save({"_key": "5001", "name": "None", type: "Topping"});
db.dishStrings.save({"_key": "5002", "name": "Glazed", type: "Topping"});
db.dishStrings.save({"_key": "5005", "name": "Sugar", type: "Topping"});
db.dishStrings.save({"_key": "5007", "name": "Powdered Sugar", type: "Topping"});
db.dishStrings.save({"_key": "5006", "name": "Chocolate with Sprinkles", type: "Topping"});
db.dishStrings.save({"_key": "5003", "name": "Chocolate", type: "Topping"});
db.dishStrings.save({"_key": "5004", "name": "Maple", type: "Topping"});

db.dishEdges.save("dishStrings/1001", "dish/batter", {tasty: true, type: "Batter"})
db.dishEdges.save("dishStrings/1002", "dish/batter", {tasty: true, type: "Batter"})
db.dishEdges.save("dishStrings/1003", "dish/batter", {tasty: true, type: "Batter"})
db.dishEdges.save("dishStrings/1004", "dish/batter", {tasty: true, type: "Batter"})
db.dishEdges.save("dishStrings/5001", "dish/batter", {tasty: true, type: "Topping"})
db.dishEdges.save("dishStrings/5002", "dish/batter", {tasty: true, type: "Topping"})
db.dishEdges.save("dishStrings/5003", "dish/batter", {tasty: true, type: "Topping"})
db.dishEdges.save("dishStrings/5004", "dish/batter", {tasty: true, type: "Topping"})
db.dishEdges.save("dishStrings/5005", "dish/batter", {tasty: true, type: "Topping"})
db.dishEdges.save("dishStrings/5006", "dish/batter", {tasty: true, type: "Topping"})
db.dishEdges.save("dishStrings/5007", "dish/batter", {tasty: true, type: "Topping"})

db.dish.save({_key: "batter", tasty: true})

我们重新验证全文索引将起作用:

db._query("FOR oneDishStr IN FULLTEXT(dishStrings, 'name', 'Chocolate')" +
          " RETURN oneDishStr").toArray()

.toArray()会在控制台上打印出结果) 我们得到3个安打,一个击球手,两个浇头。由于搜索字符串可能包含未经验证的字符串,因此我们宁愿use bind variables to circumvent injections

db._query("FOR oneDishStr IN FULLTEXT(dishStrings, 'name', @searchString) " + 
          " RETURN oneDishStr", 
          {searchString: "Chocolate"});

现在让我们使用边缘关系来找到连接的菜:

db._query("FOR oneDishStr IN FULLTEXT(dishStrings, 'name', @searchString) "+ 
          "RETURN {str: oneDishStr, " + 
                  "dishes: NEIGHBORS(dishStrings, dishEdges, oneDishStr," + 
                                     " 'outbound')}",
           {searchString: "Chocolate"})

这是使用图表的旧(最多2.7)方式,因为我们想要使用快速过滤器,lets translate this to the new 2.8 syntax

db._query("FOR oneDishStr IN FULLTEXT(dishStrings, 'name', @searchString) " + 
          "  FOR v IN 1..1 OUTBOUND oneDishStr dishEdges RETURN " + 
          "    {str: oneDishStr, dish: v}",
         {searchString: "Chocolate"})

我们可以看到,在这两种情况下,我们都会为Chocolate的3个全文搜索命中中的每一个获得一次遍历。现在我们只对Toppings的匹配感兴趣,因此我们将过滤所有不属于Topping类型的边:

db._query("FOR oneDishStr IN FULLTEXT(dishStrings, 'name', @searchString) "+
          "   FOR v, e IN 1..1 OUTBOUND oneDishStr dishEdges " + 
          "      FILTER e.type == 'Topping' " +
          "         RETURN {str: oneDishStr, dish: v}", 
          {searchString: "Chocolate"})