Question

我正在尝试实现此示例：

https://github.com/Azure/azure-documentdb-python/blob/master/samples/DatabaseManagement/Program.py

从azure documentdb获取数据并进行一些可视化。但是，我想在这里使用#error这一行的查询。

def read_database(client, id):
    print('3. Read a database by id')

    try:

       db = next((data for data in client.ReadDatabases() if data['id'] == database_id))
       coll = next((coll for coll in client.ReadCollections(db['_self']) if coll['id'] == database_collection))
       return list(itertools.islice(client.ReadDocuments(coll['_self']), 0, 100, 1))

    except errors.DocumentDBError as e:
        if e.status_code == 404:
            print('A Database with id \'{0}\' does not exist'.format(id))
        else:
            raise errors.HTTPFailure(e.status_code)

当我想获得＆gt; 10k物品时，取物真的很慢，我该如何改善？

谢谢！

Answer 1

您无法通过数据库实体直接查询文档。

代码中使用的ReadDocuments（）方法的参数应该是集合链接和查询选项。

def ReadDocuments(self, collection_link, feed_options=None):
    """Reads all documents in a collection.

    :Parameters:
        - `collection_link`: str, the link to the document collection.
        - `feed_options`: dict

    :Returns:
        query_iterable.QueryIterable

    """
    if feed_options is None:
        feed_options = {}

    return self.QueryDocuments(collection_link, None, feed_options)

所以，你可以修改你的代码如下：

# Initialize the Python DocumentDB client
client = document_client.DocumentClient(config['ENDPOINT'], {'masterKey': config['MASTERKEY']})

db = "db"
coll = "coll"

try:
    database_link = 'dbs/' + db
    database = client.ReadDatabase(database_link)

    collection_link = 'dbs/' + db + "/colls/" + coll
    collection = client.ReadCollection(collection_link)

    # options = {}
    # options['enableCrossPartitionQuery'] = True
    # options['partitionKey'] = 'jay'
    docs = client.ReadDocuments(collection_link)
    print(list(docs))

except errors.DocumentDBError as e:
    if e.status_code == 404:
        print('A Database with id \'{0}\' does not exist'.format(id))
    else:
        raise errors.HTTPFailure(e.status_code)

如果您想查询集合的分区，请添加上述代码中注释的代码片段。

   options = {}
   options['enableCrossPartitionQuery'] = True
   options['partitionKey'] = 'jay'

您的问题似乎集中在Azure Cosmos数据库查询性能上。

您可以参考以下几点来提高查询效果。

<强> 分区

您可以在数据库中设置分区键，并在单个分区键上使用过滤器子句进行查询，以便它需要更低的延迟并消耗更低的RU。

<强> 吞吐量

您可以将吞吐量设置得更大，以便Azure Cosmos DB在单位时间内的性能将得到极大提升。当然，这会导致更高的成本。

索引政策

使用索引路径可以提供更高的性能和更低的延迟。

有关详细信息，建议您参考official performance documentation。

希望它对你有所帮助。

如何更快地从azure documentdb获取数据

1 个答案: