ElasticSearch:过滤深层嵌套数据

时间:2014-01-23 16:33:35

标签: elasticsearch

我们的数据存储在MongoDB 2.4.8中,并使用ElasticSearch MongoDB River 1.7.3索引到ElasticSearch 0.90.7。

我们的数据正确索引,我可以成功搜索我们想要搜索的字段。但我还需要过滤权限 - 当然我们只想返回调用用户实际可以读取的结果。

在我们服务器上的代码中,我将调用用户的授权作为数组,例如:

[ "Role:REGISTERED_USER", "Account:52c74b25da06f102c90d52f4", "Role:USER", "Group:52cb057cda06ca463e78f0d7" ]

我们正在搜索的单位数据示例如下:

{
    "_id" : ObjectId("52dffbd6da06422559386f7d"),
    "content" : "various stuff",
    "ownerId" : ObjectId("52d96bfada0695fcbdb41daf"),
    "acls" : [
        {
            "accessMap" : {},
            "sourceClass" : "com.bulb.learn.domain.units.PublishedPageUnit",
            "sourceId" : ObjectId("52dffbd6da06422559386f7d")
        },
        {
            "accessMap" : {
                "Role:USER" : {
                    "allow" : [
                        "READ"
                    ]
                },
                "Account:52d96bfada0695fcbdb41daf" : {
                    "allow" : [
                        "CREATE",
                        "READ",
                        "UPDATE",
                        "DELETE",
                        "GRANT"
                    ]
                }
            },
            "sourceClass" : "com.bulb.learn.domain.units.CompositeUnit",
            "sourceId" : ObjectId("52dffb54da06422559386f57")
        }
    ]
}

在上面的示例数据中,我已将所有可搜索的内容替换为  "content" : "various stuff"

授权数据位于“acls”数组中。我需要编写的过滤器将执行以下操作(英文):

pass all units where the "acls" array
contains an "accessMap" object
that contains a property whose name is one of the user's authorization strings
and whose "allow" property contains "READ"
and whose "deny" property does not contain "READ"

在上面的示例中,用户具有“Role:USER”授权,并且此单元的accessMap具有“Role:USER”,其中包含“allow”,其中包含“READ”和“Role:USER”不包含“拒绝”。所以这个单位会通过过滤器。

我没有看到如何使用ElasticSearch为此编写过滤器。

我的印象是有两种方法可以处理嵌套数组:“嵌套”或“has_child”(或“has_parent”)。

我们不愿意使用“嵌套”过滤器,因为它显然要求在任何数据更改时重新索引整个块。可搜索的内容和授权数据可以随时更改,以响应用户的操作。

在我看来,为了使用“has_child”或“has_parent”,授权数据必须与单元数据(在不同的集合中?)分开,并且当节点被索引时,它会必须指定其父或子。我不知道ElasticSearch MongoDB River是否能够做到这一点。

这甚至可能吗?或者我们应该重新安排授权数据吗?

2 个答案:

答案 0 :(得分:9)

您需要重新调整数据。

在密钥中使用Elasticsearch存在问题。它最终将作为一个单独的字段,你将拥有一个不断增长的映射,因此也是集群状态。

您可能希望将accessMap作为对象列表,使用当前作为值的键。然后,它必须嵌套。否则,您无法知道匹配允许属于哪个accessMap。

ACL是否应该嵌套(导致嵌套的两个级别)或父子级取决于更新各种对象的频率。通过将它们作为对象的嵌套文档,您可以支付每次更新时加入的成本。如果你做亲子,你需要在每次搜索时支付加入费用。

这很快变得复杂,所以我准备了一个简化的可运行的例子,你可以玩:https://www.found.no/play/gist/8582654

请注意nested - 和bool - 过滤器是如何嵌套的。将两个嵌套在一个bool中是行不通的。

#!/bin/bash

export ELASTICSEARCH_ENDPOINT="http://localhost:9200"

# Create indexes

curl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{
    "settings": {
        "analysis": {}
    },
    "mappings": {
        "type": {
            "properties": {
                "acls": {
                    "type": "nested",
                    "properties": {
                        "accessMap": {
                            "type": "nested",
                            "properties": {
                                "allow": {
                                    "type": "string",
                                    "index": "not_analyzed"
                                },
                                "deny": {
                                    "type": "string",
                                    "index": "not_analyzed"
                                },
                                "key": {
                                    "type": "string",
                                    "index": "not_analyzed"
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}'


# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"type","_id":1}}
{"acls":[{"accessMap":[{"key":"Role:USER","allow":["READ"]},{"key":"Account:52d96bfada0695fcbdb41daf","allow":["READ","UPDATE"]}]}]}
{"index":{"_index":"play","_type":"type","_id":2}}
{"acls":[{"accessMap":[{"key":"Role:USER","allow":["READ"]},{"key":"Account:52d96bfada0695fcbdb41daf","deny":["READ","UPDATE"]}]}]}
{"index":{"_index":"play","_type":"type","_id":3}}
{"acls":[{"accessMap":[{"key":"Role:USER","allow":["READ"]},{"key":"Account:52d96bfada0695fcbdb41daf","allow":["READ","UPDATE"]}]}]}
'

# Do searches

curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
    "query": {
        "filtered": {
            "filter": {
                "nested": {
                    "path": "acls",
                    "filter": {
                        "bool": {
                            "must": {
                                "nested": {
                                    "path": "acls.accessMap",
                                    "filter": {
                                        "bool": {
                                            "must": [
                                                {
                                                    "term": {
                                                        "allow": "READ"
                                                    }
                                                },
                                                {
                                                    "terms": {
                                                        "key": [
                                                            "Role:USER",
                                                            "Account:52d96bfada0695fcbdb41daf"
                                                        ]
                                                    }
                                                }
                                            ]
                                        }
                                    }
                                }
                            },
                            "must_not": {
                                "nested": {
                                    "path": "acls.accessMap",
                                    "filter": {
                                        "bool": {
                                            "must": [
                                                {
                                                    "term": {
                                                        "deny": "READ"
                                                    }
                                                },
                                                {
                                                    "terms": {
                                                        "key": [
                                                            "Role:USER",
                                                            "Account:52d96bfada0695fcbdb41daf"
                                                        ]
                                                    }
                                                }
                                            ]
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
'

为了完整性,以下是父子方法的类似示例:https://www.found.no/play/gist/8586840

#!/bin/bash

export ELASTICSEARCH_ENDPOINT="http://localhost:9200"

# Create indexes

curl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{
    "settings": {
        "analysis": {}
    },
    "mappings": {
        "acl": {
            "_parent": {
                "type": "document"
            },
            "properties": {
                "acls": {
                    "properties": {
                        "accessMap": {
                            "type": "nested",
                            "properties": {
                                "key": {
                                    "type": "string",
                                    "index": "not_analyzed"
                                },
                                "allow": {
                                    "type": "string",
                                    "index": "not_analyzed"
                                },
                                "deny": {
                                    "type": "string",
                                    "index": "not_analyzed"
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}'


# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"document","_id":1}}
{"title":"Doc 1"}
{"index":{"_index":"play","_type":"acl","_parent":1}}
{"acls":[{"accessMap":[{"key":"Role:USER","allow":["READ"]},{"key":"Account:52d96bfada0695fcbdb41daf","allow":["READ","UPDATE"]}]}]}
{"index":{"_index":"play","_type":"document","_id":2}}
{"title":"Doc 2"}
{"index":{"_index":"play","_type":"acl","_parent":2}}
{"acls":[{"accessMap":[{"key":"Role:USER","allow":["READ"]},{"key":"Account:52d96bfada0695fcbdb41daf","deny":["READ","UPDATE"]}]}]}
'

# Do searches

curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
    "query": {
        "filtered": {
            "filter": {
                "has_child": {
                    "type": "acl",
                    "filter": {
                        "bool": {
                            "must": [
                                {
                                    "nested": {
                                        "path": "acls.accessMap",
                                        "filter": {
                                            "bool": {
                                                "must": [
                                                    {
                                                        "terms": {
                                                            "key": [
                                                                "Role:USER",
                                                                "Account:52d96bfada0695fcbdb41daf"
                                                            ]
                                                        }
                                                    },
                                                    {
                                                        "term": {
                                                            "allow": "READ"
                                                        }
                                                    }
                                                ]
                                            }
                                        }
                                    }
                                }
                            ],
                            "must_not": [
                                {
                                    "nested": {
                                        "path": "acls.accessMap",
                                        "filter": {
                                            "bool": {
                                                "must": [
                                                    {
                                                        "terms": {
                                                            "key": [
                                                                "Role:USER",
                                                                "Account:52d96bfada0695fcbdb41daf"
                                                            ]
                                                        }
                                                    },
                                                    {
                                                        "term": {
                                                            "deny": "READ"
                                                        }
                                                    }
                                                ]
                                            }
                                        }
                                    }
                                }
                            ]
                        }
                    }
                }
            }
        }
    }
}
'

答案 1 :(得分:-5)

谢谢@Alex Brasetvik,你的建议是制作主题ID数据而不是密钥,你的嵌套解释是“每次更新加入”,但是亲子是“按查询加入”,大多数是有帮助的。

我发现我必须“取消嵌套”数据才能使用父子方法,我们更愿意保持授权数据的嵌套。

我不明白你的意思是“将两个嵌套在一个bool中是行不通的。”

以下是我重构数据的方法:

{
    "_id" : ObjectId("52dffbd6da06422559386f7d"),
    "content" : "various stuff",
    "ownerId" : ObjectId("52d96bfada0695fcbdb41daf"),
    "accessMaps" : [
        {
            "sourceClass" : "com.bulb.learn.domain.units.PublishedPageUnit",
            "sourceId" : ObjectId("52dffbd6da06422559386f7d")
        },
        {
            "allow" : {
                "CREATE" : [
                    "Account:52d96bfada0695fcbdb41daf"
                ],
                "READ" : [
                    "Account:52d96bfada0695fcbdb41daf",
                    "Role:USER"
                ],
                "UPDATE" : [
                    "Account:52d96bfada0695fcbdb41daf"
                ],
                "DELETE" : [
                    "Account:52d96bfada0695fcbdb41daf"
                ],
                "GRANT" : [
                    "Account:52d96bfada0695fcbdb41daf"
                ]
            },
            "deny" : {},
            "sourceClass" : "com.bulb.learn.domain.units.CompositeUnit",
            "sourceId" : ObjectId("52dffb54da06422559386f57")
        }
    ]
}

新映射如下所示:

{
  "unit": {
    "properties": {
      "accessMaps": {
        "type": "nested",
        "properties": {
          "allow": {
            "type": "nested",
            "properties": {
              "CREATE": {
                "type": "string",
                "index": "not_analyzed",
              },
              "DELETE": {
                "type": "string",
                "index": "not_analyzed",
              },
              "GRANT": {
                "type": "string",
                "index": "not_analyzed",
              },
              "READ": {
                "type": "string",
                "index": "not_analyzed",
              },
              "UPDATE": {
                "type": "string",
                "index": "not_analyzed",
              }
            } 
          },    
          "deny": {
            "type": "nested",
            "properties": {
              "CREATE": {
                "type": "string",
                "index": "not_analyzed",
              },
              "DELETE": {
                "type": "string",
                "index": "not_analyzed",
              },
              "GRANT": {
                "type": "string",
                "index": "not_analyzed",
              },
              "READ": {
                "type": "string",
                "index": "not_analyzed",
              },
              "UPDATE": {
                "type": "string",
                "index": "not_analyzed",
              } 
            }   
          },    
          "sourceClass": {
            "type": "string"
          },
          "sourceId": {
            "type": "string"
          }
        }
      }
    }
  }
}

过滤后的查询如下所示:

{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "bool": {
          "must": {
            "nested": {
              "path": "accessMaps.allow",
              "filter": {
                "terms": {
                  "accessMaps.allow.READ": [
                    "Role:REGISTERED_USER",
                    "Account:52e6a361da06e4eb64172519",
                    "Role:USER",
                    "Group:52cb057cda06ca463e78f0d7"
                  ]
                }
              }
            }
          },
          "must_not": {
            "nested": {
              "path": "accessMaps.deny",
              "filter": {
                "terms": {
                  "accessMaps.deny.READ": [
                    "Role:REGISTERED_USER",
                    "Account:52e6a361da06e4eb64172519",
                    "Role:USER",
                    "Group:52cb057cda06ca463e78f0d7"
                  ]
                }
              }
            }
          }
        }
      }
    }
  }
}

我遇到的最大问题是如何在嵌套过滤器中使用“path”属性,并且术语过滤器中的字段名称必须是完全限定的。我希望ElasticSearch能够在他们的文档中投入更多精力。