如何在python中压缩列表列表?

时间:2016-10-10 20:25:06

标签: python list

我有一份清单

sample = [['A','T','N','N'],['T', 'C', 'C', 'C']],[['A','T','T','N'],['T', 'T', 'C', 'C']].

我正在尝试压缩文件,只有A / T / G / C在列表中,输出需要是列表

[['AT','TCCC'],['ATT','TTCC']]

当我使用此代码时:

tt = ["".join(y for y in x if y in {'A','G','T','C'}) for x in sample]

但是,我只得到输出:

['ATT','TTCC']

我出错的任何建议?

在我的实际代码中,我首先转置列表:

seq_list = [['TCCGGGGGTATC', 'TCCGTGGGTATC', ...]]  # one nested list

numofpops = len(seq_list)

### Tranposing. Moving along the columns only

#column_list = []
for k in range(len(seq_list)):
    column_list = [[] for i in range(len(seq_list[k][0]))]
    for seq in seq_list[k]:
        for i, nuc in enumerate(seq):
            column_list[i].append(nuc)
            ddd = column_list
    print ddd

tt = ["".join(y for y in x if y in {'A','G','T','C'}) for x in ddd]
print tt

4 个答案:

答案 0 :(得分:3)

您的实际代码是丢弃列表。您只处理最后一个条目

否则您的代码可以正常工作。只需在循环中执行 ,然后将结果追加到最终列表中:

results = []

for k in range(len(seq_list)):
    column_list = [[] for i in range(len(seq_list[k][0]))]
    for seq in seq_list[k]:
        for i, nuc in enumerate(seq):
            column_list[i].append(nuc)
    # process `column_list` here, in the loop (no need to assign to ddd)
    tt = ["".join(y for y in x if y in {'A','G','T','C'}) for x in column_list]

    results.append(tt)

请注意,您可以使用zip()功能代替转置列表:

results = []
for sequence in seq_list:
    for column_list in zip(*sequence):
        tt = [''.join([y for y in x if y in 'AGTC']) for x in column_list]
        results.append(tt)

答案 1 :(得分:1)

您想要执行以下操作:

sample = [[['A','T','N','N'],['T', 'C', 'C', 'C']], [['A','T','T','N'],['T', 'T', 'C', 'C']]]

然后:

tt = [[''.join([c for c in sublist if c in 'AGTC']) for sublist in doublet] for doublet in sample]

这样可能更具可读性:

tt = [
    [''.join([c for c in sublist if c in 'AGTC'])
     for sublist in doublet]
    for doublet in sample
]

它给出了期望的结果:

[['AT', 'TCCC'], ['ATT', 'TTCC']]

答案 2 :(得分:1)

您可以先创建辅助函数:

near "as": syntax error: 

然后:

def filterJoin(s):
    return ''.join(x for x in s if x in 'ATGC')

答案 3 :(得分:0)

我认为你给出的输入是一个列表中的两个项目。然后,您将使用具有2级嵌套的列表理解。在最深层次,您可以过滤掉非A, T, GC以及join其他项目的项目:

sample =  [[['A','T','N','N'],['T', 'C', 'C', 'C']], [['A','T','T','N'],['T', 'T', 'C', 'C']]]

result = [[''.join(i for i in lst if i in 'ATGC') for lst in sub] for sub in sample]
#                                  ^<- filter items
#           ^<- join the matching items
print(result)
# [['AT', 'TCCC'], ['ATT', 'TTCC']]