Question

我正在使用describe_table_statistics检索给定DMS任务中的表列表，并有条件地循环使用带有response ['Marker']的describe_table_statistics。

当我不使用任何过滤器时，我得到了正确的记录数13k +。当使用结果集少于MaxRecords的过滤器或过滤器组合时，我得到的记录数正确。

但是，当我通过一个过滤器时，该过滤器会获得比MaxRecords大的记录集，因此我得到的记录要比我少得多。

这是我检索表集的功能：

def get_dms_task_tables(account, region, task_name, schema_name=None, table_state=None):
   tables=[]
   max_records=500

   filters=[]
   if schema_name:
      filters.append({'Name':'schema-name', 'Values':[schema_name]})
   if table_state:
      filters.append({'Name':'table-state', 'Values':[table_state]})

   task_arn = get_dms_task_arn(account, region, task_name)

   session = boto3.Session(profile_name=account, region_name=region)
   client = session.client('dms')

   response = client.describe_table_statistics(
      ReplicationTaskArn=task_arn
      ,Filters=filters
      ,MaxRecords=max_records)

   tables += response['TableStatistics']

   while len(response['TableStatistics']) == max_records:
      response = client.describe_table_statistics(
         ReplicationTaskArn=task_arn
         ,Filters=filters
         ,MaxRecords=max_records
         ,Marker=response['Marker'])

      tables += response['TableStatistics']

   return tables

为进行故障排除，我遍历了每个表打印一行的表：

        print(', '.join((
            t['SchemaName']
            ,t['TableName']
            ,t['TableState'])))

当我未针对“表已完成”的表状态传递任何过滤器和grep时，我会通过控制台获得12k +条记录，这是正确的计数

至少从表面上看，响应循环有效。

当我输入模式名称和表状态过滤条件时，得到控制台确认的正确计数，但是该计数小于MaxRecords。

当我仅通过表状态过滤器获取“表已完成”时，我仅获得949条记录，因此我丢失了11k条记录。

我尝试从循环内的describe_table_statistics中省略Filter参数，但是在所有情况下我都得到相同的结果。

我怀疑我在循环内对describe_table_statistics的调用有问题，但是我无法在Amazon的文档中找到此类示例来确认这一点。

Answer 1

应用过滤器时，describe_table_statistics不符合MaxRecords限制。

实际上，它似乎要做的是检索（2 x MaxRecords），应用过滤器并返回该集合。或者，它可能会检索MaxRecords，应用过滤器，然后继续直到结果集大于MaxRecords。无论哪种方式，我的while条件都是问题。

我替换了

while len(response['TableStatistics']) == max_records:

使用

while 'Marker' in response:

现在该函数返回正确的记录数。

顺便说一句，我的第一次尝试是

while len(response['TableStatistics']) >= 1:

但是在循环的最后一次迭代中，它引发了此错误：

KeyError: 'Marker'

现在完成的工作功能看起来像这样：

def get_dms_task_tables(account, region, task_name, schema_name=None, table_state=None):
   tables=[]
   max_records=500

   filters=[]
   if schema_name:
      filters.append({'Name':'schema-name', 'Values':[schema_name]})
   if table_state:
      filters.append({'Name':'table-state', 'Values':[table_state]})

   task_arn = get_dms_task_arn(account, region, task_name)

   session = boto3.Session(profile_name=account, region_name=region)
   client = session.client('dms')

   response = client.describe_table_statistics(
      ReplicationTaskArn=task_arn
      ,Filters=filters
      ,MaxRecords=max_records)

   tables += response['TableStatistics']

   while 'Marker' in response:
      response = client.describe_table_statistics(
         ReplicationTaskArn=task_arn
         ,Filters=filters
         ,MaxRecords=max_records
         ,Marker=response['Marker'])

      tables += response['TableStatistics']

   return tables

AWS DMS使用DatabaseMigrationService.Client.describe_table_statistics丢失具有较大结果集的记录

1 个答案: