BigTable:2个写入相同的键,但有3个版本

时间:2019-03-20 18:55:49

标签: bigtable google-cloud-bigtable

有时候,如果我在同一个行键上写多个版本,并且在多个批处理突变中具有多个列族(每个版本都与多次写操作一起批处理)。

enter image description here

这是由于数据压缩引起的预期行为吗?将来会删除多余的版本吗?

1 个答案:

答案 0 :(得分:1)

这里的问题是,您要将两列放在批处理中的两个单独的条目中,这意味着即使它们具有相同的行,也不会自动应用。

批处理条目可以分别成功或失败,然后客户端将仅重试失败的条目。例如,如果一个条目成功而另一个超时,但后来又无声地成功,则重试“失败”的条目可能会导致您看到部分写入结果。

因此,在python中,您应该执行以下操作(改编自cloud.google.com/bigtable/docs/samples-python-hello):

print('Writing some greetings to the table.')
greetings = ['Hello World!', 'Hello Cloud Bigtable!', 'Hello Python!']
rows = []
column1 = 'greeting1'.encode()
column1 = 'greeting2'.encode()
for i, value in enumerate(greetings):
    # Note: This example uses sequential numeric IDs for simplicity,
    # but this can result in poor performance in a production
    # application.  Since rows are stored in sorted order by key,
    # sequential keys can result in poor distribution of operations
    # across nodes.
    #
    # For more information about how to design a Bigtable schema for
    # the best performance, see the documentation:
    #
    #     https://cloud.google.com/bigtable/docs/schema-design
    row_key = 'greeting{}'.format(i).encode()
    row = table.row(row_key)

    # **Multiple calls to 'set_cell()' are allowed on the same batch
    # entry. Each entry will be applied atomically, but a separate
    # 'row' in the same batch will be applied separately even if it
    # shares its row key with another entry.**
    row.set_cell(column_family_id,
                 column1,
                 value,
                 timestamp=datetime.datetime.utcnow())
    row.set_cell(column_family_id,
                 column2,
                 value,
                 timestamp=datetime.datetime.utcnow())
    rows.append(row)
table.mutate_rows(rows)