对于数据库测试,我需要生成查询。为了降低复杂性,我们假设只有" insert" - 和" select" -queries,我们只存储最多2 ^ 64的整数。数据库中的条目分为两个级别:主键和群集键。每个主密钥最多可包含2 ^ 64个唯一的群集密钥,最多可包含2 ^ 64个唯一数据项。
对于每个插入查询,给出两个机会值:
我还有一个伪随机数生成器,以及已经生成的项目数。此数字还用于在创建新项目时为随机生成器设定种子。请参阅代码,了解我是如何尝试这样做的:
from random import Random
def generate_seeds(main_chance, cluster_chance, max_generated):
generator = Random()
new = main_chance > generator.random()
# increase the counter if a new item is generated
max_generated += new
# We chose "insert", so a new item needs to be generated
if new:
main_key = max_generated
# seed the generator with that main_key
generator.seed(main_key)
# now determine if a whole new item will be generated
# or an old key gets new additional items
# Save the main seed. In case we just add an item
# the main seed will be an old one and the main
# seed will only be used for the new items.
cluster_key = main_key
add_item = cluster_chance > generator.random()
# check if a completely new item will be generated
if (not add_item) and new:
return main_key, main_key, max_generated
# We need an old main key that created a new item, so iterate
# over the old keys until we find one that did. If no key was
# ever used to create a completely new item, fall back to
# seed zero, which always generates a completely new item.
if add_item:
# if the cluster_chance is big we might iterate very often :(
for main_key in generator.sample(xrange(main_key), main_key):
generator.seed(main_key)
if cluster_chance < generator.random() or \
cluster_key == 0:
break
else:
# special case: no items have been generated yet
main_key = 0
return main_key, cluster_key, max_generated
else:
# The choice was "select", regenerate an old item
choice = generator.randint(0, max_generated)
return generate_seeds(1, cluster_chance, choice)
问题:可能很多&#34;递归&#34;在add_item之后调用for循环,更有可能是更大的cluster_chance
。
如何以更好的方式解决这个问题?
编辑:我想到的唯一想法是构建一个int列表。 list [n]是:
问题是,此解决方案使用了大量内存:d = [x for x in xrange(100000000)]
(1亿个值)使用3.183.344KiB内存,因此每个值约为32,6字节,或每千兆字节32.939.450。因此,使用32GiB RAM,可以管理大约10亿个值 - 很好,但还不够好。