Question

我在python中编写脚本，现在我必须创建一个包含248956422整数的相当大的列表。关键是，其中一些＆＃34; 0＆＃34;在这个表中将改变1,2或3，因为我有8个列表，4个具有基因的起始位置，4个具有它们的结尾。关键是我必须迭代＆＃34; anno＆＃34;几次导致数字替换0可以随着其他迭代而改变。＆＃34;安野＆＃34;必须写入文件以创建注释文件。这是我的问题，我怎么能分开，或者在飞行中做，不要让记忆错误包括替换＆＃34; 0＆＃34;对于其他人，1,2,3为其他人。 Mabye重写文件？我等你的意见，请问我是否写得不清楚：P。

whole_st_gen = [] #to make these lists more clear for example
whole_end_gen = [] # whole_st_gen has element "177" 
whole_st_ex = [] # and whole_end_gen has "200" so from position 177to200
whole_end_ex = [] # i need to put "1"
whole_st_mr = [] # of course these list can have even 1kk+ elements
whole_end_mr = [] # note that every st/end of same kind have equal length
whole_st_nc = [] 
whole_end_nc = [] #these lists are including some values of course
length = 248956422 
anno = ['0' for i in range(0,length)] # here i get the memoryerror
#then i wanted to do something like..
for j in range(0, len(whole_st_gen)):
    for y in range(whole_st_gen[j],whole_end_gen[j]):
        anno[y]='1'

Answer 1

您可以使用bytearray对象来获得比整数列表更紧凑的内存表示：

anno = bytearray(b'\0' * 248956422)
print(anno[0])  # → 0
anno[0] = 2
print(anno[0])  # → 2
print(anno.__sizeof__())  # → 248956447 (on my computer)

Answer 2

通过动态确定anno中每个元素的值，您可能会更好：

def anno():
    for idx in xrange(248956422):
        elm = "0"

        for j in range(0, len(whole_st_gen)):
            if whole_st_gen[j] <= idx < whole_end_gen[j]:
                elm = "1"                    

        for j in range(0, len(whole_st_ex)):
            if whole_st_ex[j] <= idx < whole_end_ex[j]:
                elm = "2"                    

        for j in range(0, len(whole_st_mr)):
            if whole_st_mr[j] <= idx < whole_end_mr[j]:
                elm = "3"                    

        for j in range(0, len(whole_st_nc)):
            if whole_st_nc[j] <= idx < whole_end_nc[j]:
                elm = "4"                    

        yield elm

然后你只需使用for elm in anno()进行迭代。

我从OP获得了一个编辑建议，建议为whole_*_gen，whole_st_ex等提供一个函数，如下所示：

def anno_st（）：对于xrange中的idx（248956422）： elm =“0”

     for j in range(0, len(whole_st_gen)):
        if whole_st_ex[j] <= idx <= whole_end_ex[j]:
            elm = "2"                    

     yield elm

这当然是可行的，但它只会导致应用whole_*_ex的变化，并且在写入文件时需要将它们组合起来，这可能有点尴尬：

for a, b, c, d in zip(anno_st(), anno_ex(), anno_mr(), anno_nc()):
    if d != "0":
        write_to_file(d)
    elif c != "0":
        write_to_file(c)
    elif b != "0":
        write_to_file(b)
    else:
        write_to_file(a)

但是，如果您只想应用某些更改集，则可以编写一个将它们作为参数的函数：

def anno(*args):
    for idx in xrange(248956422):
        elm = "0"

        for st, end, tag in args:
             for j in range(0, len(st)):
                 if st <= idx < end[j]:
                      elm = tag

        yield tag

然后通过提供列表进行调用（例如只有两个第一次更改）：

 for tag in anno((whole_st_gen, whole_end_gen, "1"),
                 (whole_st_ex, whole_end_ex, "2")):
     write_to_file(tag)

Answer 3

我没有使用列表推导创建列表，而是建议使用generator-expression创建一个迭代器，它根据需要生成数字，而不是将所有数据保存在内存中。此外，您不需要使用{{ 1}}在你的循环中，因为它只是一个你不使用它的抛弃变量。

但是请注意，迭代器是一次性迭代，并且在迭代一次之后就无法使用它。如果要多次使用它，可以使用{{anno = ('0' for _ in range(0,length)) # In python 2.X use xrange() instead of range()从它创建独立的迭代器3}}

另请注意，如果要根据条件更改某些元素，则可以通过迭代迭代器并使用生成器表达式应用条件来更改某些元素。

例如：

Memoryerror列表太大

3 个答案: