Question

我的Python代码有问题。我正在处理一个包含分数的大型（3.5gb）JSON文件，我需要以21984分数的块（这是一个查询的所有分数）的形式来使用它。该代码工作正常，但我的测试集是4000个查询。前10个执行速度很快，但是此后它以指数方式增加了计算这部分代码的时间。所以5小时后，我进行了500次查询。这些打印用于日志记录，看来问题出在将行翻译或附加到列表中。有谁知道如何使它变快或看到是什么导致它变慢？

def getscorebatch(number):
    print('Creating Batch..')
    batch_temp = list()
    with open(json_file_name, 'r') as FileObj:
        print("Creating slice...")
        lines_gen = islice(FileObj, (21894 * number), ((21894 * number) + 21894))
        print("Appending slice...")
        for line in lines_gen:
            line = line.translate({ord(c): None for c in ':",}{ \n'})
            batch_temp.append(line)
    return batch_temp

更新：我尝试实施您的建议，而且速度更快！非常感谢。我对生成器还算陌生，所以我现在不了解，如何获得正确的代码块？每次都会给我第一块。

def generator(file_to_read):
c = 0
while c < 21894:
    data = file_to_read.readline()
    c += 1
    if not data:
        break
    yield data



def getscorebatch(number):
    print('Creating Batch..')
    batch_temp = [0]*22000
    with open(json_file_name, 'r') as FileObj:
        gen_file = generator(FileObj)
        batch_temp = [line.translate(line.maketrans("", "", REMOVE)) for line in gen_file]
        print(len(batch_temp))
    return batch_temp

Python-列表添加变得越来越慢？

0 个答案: