我一直在尝试重建一些列表,例如:
[42351, 4253, 1264, 5311, 3651] # The first number in a list is an ID
[42352, 4254, 1244, 1246, 5311, 1264, 3651]
[42353, 1254, 1264]
采用以下格式:
# ID \t 1 \t the_second_number_in_a_list \t ID \t 2 \t the_third_number_in_a_list \t ID \t 3 \t the_forth_number_in_a_list ...
42352 1 4254 42352 2 1244 42352 3 1246 42352 4 5311 42352 5 1264 42352 6 3651
42353 1 1254 42353 2 1264
42351 1 4253 42351 2 1264 42351 3 5311 42351 4 3651
我的想法是创建一个具有所需格式的中间词典:
list_dic = {42352: [42352, 1, 4254, 42352, 2, 1244, 42352, 3, 1246, 42352, 4, 5311, 42352, 5, 1264, 42352, 6, 3651], 42353: [42353, 1, 1254, 42353, 2, 1264], 42351: [42351, 1, 4253, 42351, 2, 1264, 42351, 3, 5311, 42351, 4, 3651]}
然后将其保存到由tab分隔的txt文件中。
然而,我意识到实际上我可能有数十万个列表,而且我的方式会很慢并且计算成本很高。 我正在寻找建议以加快我的代码并减少整个过程所需的内存。谢谢。
附上我的代码:
seq1 = [42351, 4253, 1264, 5311, 3651]
seq2 = [42352, 4254, 1244, 1246, 5311, 1264, 3651]
seq3 = [42353, 1254, 1264]
# First, group all information into a single list
seq_list = [seq1, seq2, seq3]
# Second, construct a dictionary to store all information
list_dic = {}
for each_seq in seq_list:
j = 1
list_dic[each_seq[0]] = []
for each_item in each_seq[1:]:
list_dic[each_seq[0]].append(each_seq[0])
list_dic[each_seq[0]].append(j)
list_dic[each_seq[0]].append(each_item)
j += 1
# Third, save the information into a txt file
text_file = open("Output.txt", "w")
for each_id in list_dic:
line = '\t'.join(str(each_num) for each_num in list_dic[each_id])
text_file.write(line+'\n')
text_file.close()
答案 0 :(得分:4)
from itertools import chain,count,cycle
with open("out.txt","wb") as f:
for eachlist in alllists:
merged = zip(cycle([eachlist[0],]),count(1),eachlist[1:])
f.write( "\t".join( map(str,chain.from_iterable(merged)) ) )
f.write("\n")
据我所知,没有任何理由可以创建中间词典
(那说你现有的解决方案似乎也很可行(尽管可能会慢一点)
表示@SirParselot
>>> seq1 = [42351, 4253, 1264, 5311, 3651]
>>> seq2 = [42352, 4254, 1244, 1246, 5311, 1264, 3651]
>>> seq3 = [42353, 1254, 1264]
>>> alllists = [seq1, seq2, seq3]
>>> for eachlist in alllists:
... merged = zip(cycle([eachlist[0],]),count(1),eachlist[1:])
... print "\t".join( map(str,chain.from_iterable(merged)) )
...
42351 1 4253 42351 2 1264 42351 3 5311 42351 4 3651
42352 1 4254 42352 2 1244 42352 3 1246 42352 4 5311 42352 5 1264 42352 6 3651
42353 1 1254 42353 2 1264
答案 1 :(得分:1)
我假设您永远不会有两个或更多具有相同ID的列表,所以这是我的代码
seq1 = [42351, 4253, 1264, 5311, 3651]
seq2 = [42352, 4254, 1244, 1246, 5311, 1264, 3651]
seq3 = [42353, 1254, 1264]
# First, group all information into a single list
seq_list = [seq1, seq2, seq3]
# Second, put lists directly into text with desired format
text_file = open("Output.txt", "w")
for i in seq_list:
for j in range(1,len(i)): #skip the first element and go to the end of the list
text_file.write(str(i[0]) + '\t' + str(j) + '\t' + str(i[j]) + '\t')
text_file.write('\n')
text_file.close()
而不是创建一个中间字典,它只是将列表直接放入具有您描述的格式的文本文件中
答案 2 :(得分:1)
不使用itertools的解决方案:
sqs = [
[42351, 4253, 1264, 5311, 3651],
[42352, 4254, 1244, 1246, 5311, 1264, 3651],
[42353, 1254, 1264]
]
for sq in sqs:
gen = ((sq[0], i, v) for i, v in enumerate(sq[1:], 1))
print(' '.join([str(x) for sub in gen for x in sub]))