Python加速了重建列表的代码

时间:2015-06-29 17:28:44

标签: python performance list

我一直在尝试重建一些列表,例如:

[42351, 4253, 1264, 5311, 3651]  # The first number in a list is an ID
[42352, 4254, 1244, 1246, 5311, 1264, 3651]
[42353, 1254, 1264]

采用以下格式:

# ID \t 1 \t the_second_number_in_a_list \t ID \t 2 \t the_third_number_in_a_list \t ID \t 3 \t the_forth_number_in_a_list ...
42352   1   4254    42352   2   1244    42352   3   1246    42352   4   5311    42352   5   1264    42352   6   3651
42353   1   1254    42353   2   1264
42351   1   4253    42351   2   1264    42351   3   5311    42351   4   3651

我的想法是创建一个具有所需格式的中间词典:

list_dic = {42352: [42352, 1, 4254, 42352, 2, 1244, 42352, 3, 1246, 42352, 4, 5311, 42352, 5, 1264, 42352, 6, 3651], 42353: [42353, 1, 1254, 42353, 2, 1264], 42351: [42351, 1, 4253, 42351, 2, 1264, 42351, 3, 5311, 42351, 4, 3651]}

然后将其保存到由tab分隔的txt文件中。

然而,我意识到实际上我可能有数十万个列表,而且我的方式会很慢并且计算成本很高。 我正在寻找建议以加快我的代码并减少整个过程所需的内存。谢谢。

附上我的代码:

seq1 = [42351, 4253, 1264, 5311, 3651]
seq2 = [42352, 4254, 1244, 1246, 5311, 1264, 3651]
seq3 = [42353, 1254, 1264]

# First, group all information into a single list
seq_list = [seq1, seq2, seq3]

# Second, construct a dictionary to store all information
list_dic = {} 
for each_seq in seq_list:
    j = 1
    list_dic[each_seq[0]] = []
    for each_item in each_seq[1:]:
        list_dic[each_seq[0]].append(each_seq[0])
        list_dic[each_seq[0]].append(j)
        list_dic[each_seq[0]].append(each_item)
        j += 1

# Third, save the information into a txt file   
text_file = open("Output.txt", "w")
for each_id in list_dic:
    line = '\t'.join(str(each_num) for each_num in list_dic[each_id])
    text_file.write(line+'\n')
text_file.close()

3 个答案:

答案 0 :(得分:4)

from itertools import chain,count,cycle
with open("out.txt","wb") as f:
    for eachlist in alllists:
        merged = zip(cycle([eachlist[0],]),count(1),eachlist[1:])
        f.write( "\t".join( map(str,chain.from_iterable(merged)) ) )
        f.write("\n")

据我所知,没有任何理由可以创建中间词典

(那说你现有的解决方案似乎也很可行(尽管可能会慢一点)

表示@SirParselot

>>> seq1 = [42351, 4253, 1264, 5311, 3651]
>>> seq2 = [42352, 4254, 1244, 1246, 5311, 1264, 3651]
>>> seq3 = [42353, 1254, 1264]
>>> alllists = [seq1, seq2, seq3]
>>> for eachlist in alllists:
...     merged = zip(cycle([eachlist[0],]),count(1),eachlist[1:])
...     print "\t".join( map(str,chain.from_iterable(merged)) )
...
42351   1       4253    42351   2       1264    42351   3       5311    42351    4       3651
42352   1       4254    42352   2       1244    42352   3       1246    42352    4       5311    42352   5       1264    42352   6       3651
42353   1       1254    42353   2       1264

答案 1 :(得分:1)

我假设您永远不会有两个或更多具有相同ID的列表,所以这是我的代码

seq1 = [42351, 4253, 1264, 5311, 3651]
seq2 = [42352, 4254, 1244, 1246, 5311, 1264, 3651]
seq3 = [42353, 1254, 1264]

# First, group all information into a single list
seq_list = [seq1, seq2, seq3]

# Second, put lists directly into text with desired format
text_file = open("Output.txt", "w")
for i in seq_list:
    for j in range(1,len(i)): #skip the first element and go to the end of the list
        text_file.write(str(i[0]) + '\t' + str(j) + '\t' + str(i[j]) + '\t')
    text_file.write('\n')
text_file.close()

而不是创建一个中间字典,它只是将列表直接放入具有您描述的格式的文本文件中

答案 2 :(得分:1)

不使用itertools的解决方案:

sqs = [
    [42351, 4253, 1264, 5311, 3651],
    [42352, 4254, 1244, 1246, 5311, 1264, 3651],
    [42353, 1254, 1264]
]

for sq in sqs:
    gen = ((sq[0], i, v) for i, v in enumerate(sq[1:], 1))
    print(' '.join([str(x) for sub in gen for x in sub]))