Question

我正在编写一个生成1万到1亿个整数的应用程序，我不确定.txt文件是否是保存整数的正确表示。以下是我的代码：

import random
def printrandomInts(n,file):
    for i in range(n):
        x = random.random();
        x = x * 10000000
        x = int(x)
        file.write(str(x))
        file.write("\n")
     file.close()

file = open("10k","w")

n = 10000
printrandomInts(n,file)
file = open("100k","w")
n*=10
printrandomInts(n,file)
file = open("1M","w")
n*=10
printrandomInts(n,file)
file = open("10M","w")
n*=10
printrandomInts(n,file)
file = open("100M","w")
printrandomInts(n*10,file)

当我运行上面的代码时，Windows报告的最大文件大小为868,053 KB。我应该使用二进制表示来有效地表示整数。我还必须为浮点数和字符串生成类似的数据。我该怎么做才能让空间更节省空间？

Answer 1

如果你想要的只是后续分析的计数，你可以使用@TomKarzes在生成它们时计算它们的想法，同时使用pickle module来存储它们：

import random, pickle

counts = [0]*10000000

for i in range(100000000):
    num = random.randint(0,9999999)
    counts[num] += 1

pickle.dump(bytes(counts),open('counts.p','wb'))

文件counts.p在我的Windows机器上只有9.53 MB - 令人印象深刻的平均每个数字少于1个字节（绝大多数计数将在5到15之间，因此存储的数字是小的一面。）

加载它们：

counts = pickle.load(open('counts.p','rb'))
counts = [int(num) for num in counts]

最后一句话 - 我在pickle转储中使用了bytes(counts)而不是简单的counts，因为在这个问题中任何计数大于255的可能性都很小。如果在某些其他情况下计数可能更大，请跳过此步骤。

使用Python生成1亿个整数

1 个答案: