Question

我正在研究用Python编写的API，它接受来自客户端的JSON有效负载，应用一些验证并将有效负载存储在MongoDB中，以便可以异步处理它们。

但是，我在使用（合法地）包含以$和/或包含.开头的密钥的有效负载时遇到了一些问题。 According to the MongoDB documentation，我最好的办法是逃避这些角色：

在某些情况下，您可能希望使用用户提供的密钥构建BSON对象。在这些情况下，密钥需要替换保留的$和.字符。任何字符都足够了，但请考虑使用Unicode全宽等价：U+FF04（即“$”）和U+FF0E（即“。”）。

足够公平，但在这里它变得有趣。我希望这个过程对应用程序透明，所以：

检索文档时，密钥应该未转义...
...但只有首先需要转义的密钥。

例如，假设（恶意）用户发送包含\ff04mixed.chars之类密钥的JSON有效负载。当应用程序从存储后端获取此文档时，此密钥应转换回\ff04mixed.chars，不 $mixed.chars。

我主要担心的是信息泄漏;我不希望有人发现应用程序需要对$和.字符进行特殊处理。坏人可能如何更好地利用MongoDB方式比我知道如何保护它，我不想冒任何机会。

Answer 1

以下是我最终采用的方法：

在将文档插入Mongo之前，通过SONManipulator运行它，搜索并转义文档中的任何非法密钥。
- 原始密钥作为单独的属性存储在文档中，以便我们以后可以恢复它们。
从Mongo检索文档后，通过SONManipulator运行它以提取原始密钥并恢复它们。

这是一个简短的例子：

# Example of a document with naughty keys.
document = {
    '$foo': 'bar',
    '$baz': 'luhrmann'
}

##
# Before inserting the document, we must first run it through our
#   SONManipulator.
manipulator = KeyEscaper()
escaped = manipulator.transform_incoming(document, collection.name)

# Now we can insert the document.
document_id = collection.insert_one(escaped).inserted_id

##
# Later, we retrieve the document.
raw = collection.find_one({'_id': document_id})

# Run the document through our KeyEscaper to restore the original
#   keys.
unescaped = manipulator.transform_outgoing(raw, collection.name)

assert unescaped == document

MongoDB中存储的实际文档如下所示：

{
  "_id": ObjectId('582cebe5cd9b344c814d98e3')

  "__escaped__1": "luhrmann",
  "__escaped__0": "bar",

  "__escaped__": {
    "__escaped__1": ["$baz", {}],
    "__escaped__0": ["$foo", {}]
  }
}

请注意包含原始密钥的__escaped__属性，以便在检索文档时还原它们。

这使得对转义密钥的查询有点棘手，但这无法首先将文档存储起来。

包含单元测试和示例用法的完整代码：
https://gist.github.com/todofixthis/79a2f213989a3584211e49bfba582b40

在MongoDB中插入/检索文档时，透明地编码/解码`$`和`.`

1 个答案: