扫描+过滤器表达式的索引计数 - >克服ProvisionedThroughputExceededException

时间:2015-10-07 00:39:14

标签: python amazon-dynamodb boto

在一个1000万条记录发电机数据库中,每个项目都有一个“纪元”时间戳属性,我试图计算一系列纪元之间的项目。 发电机表的预置读取容量单位为1000.每个项目为5~7Kb。

代码:

from boto3.session import Session
from boto3.dynamodb.conditions import Attr

START_EPOCH = 1443657600000
END_EPOCH =   1443744000000
TOTAL_ITEMS_TO_SCAN = 1000000
F_EXP = Attr('epoch').gt(START_EPOCH) & Attr('epoch').lt(END_EPOCH)

session = Session(aws_access_key_id='access_key',
                  aws_secret_access_key='secret_key',
                  region_name='region')
resource = session.resource('dynamodb')
table = resource.Table('table name')

def scan_func(last_key,counter,scanned):
    if last_key:
        result = table.scan(FilterExpression=F_EXP,
                            Select='COUNT',
                            ExclusiveStartKey=last_key)
    else:
        result = table.scan(FilterExpression=F_EXP,
                            Select='COUNT')
    counter += result['Count']
    scanned += result['ScannedCount']
    print "Current items found {} from {} scanned".format(counter, scanned)

    if counter < TOTAL_ITEMS_TO_SCAN:
        scan_func(result['LastEvaluatedKey'], counter, scanned)

    print 'total items founs : {}, from {} scanned'.format(counter,scanned)

scan_func(None, 0, 0)

即使使用段运行扫描,平均在几次迭代后我得到以下响应:

botocore.exceptions.ClientError: An error occurred
(ProvisionedThroughputExceededException) when calling the Scan operation: 
The level of configured provisioned throughput for the table was exceeded. 
Consider increasing your provisioning level with the UpdateTable API

到目前为止,我得到的最好结果是:

Current items found 16 from 3245 scanned

我还尝试在每次迭代之间引发2秒的睡眠,以便为恢复和释放配置资源提供数据库空间,并且也无法正常工作。 还尝试将配置资源的3倍增加到3,000而不是1,000,并且它做了一些迭代但最终停止了。

有关如何使这项工作的任何想法? 其他替代方案没有增加表的读取容量?

0 个答案:

没有答案