我有一份清单
sample = [['A','T','N','N'],['T', 'C', 'C', 'C']],[['A','T','T','N'],['T', 'T', 'C', 'C']].
我正在尝试压缩文件,只有A / T / G / C在列表中,输出需要是列表
[['AT','TCCC'],['ATT','TTCC']]
当我使用此代码时:
tt = ["".join(y for y in x if y in {'A','G','T','C'}) for x in sample]
但是,我只得到输出:
['ATT','TTCC']
我出错的任何建议?
在我的实际代码中,我首先转置列表:
seq_list = [['TCCGGGGGTATC', 'TCCGTGGGTATC', ...]] # one nested list
numofpops = len(seq_list)
### Tranposing. Moving along the columns only
#column_list = []
for k in range(len(seq_list)):
column_list = [[] for i in range(len(seq_list[k][0]))]
for seq in seq_list[k]:
for i, nuc in enumerate(seq):
column_list[i].append(nuc)
ddd = column_list
print ddd
tt = ["".join(y for y in x if y in {'A','G','T','C'}) for x in ddd]
print tt
答案 0 :(得分:3)
您的实际代码是丢弃列表。您只处理最后一个条目。
否则您的代码可以正常工作。只需在循环中执行 ,然后将结果追加到最终列表中:
results = []
for k in range(len(seq_list)):
column_list = [[] for i in range(len(seq_list[k][0]))]
for seq in seq_list[k]:
for i, nuc in enumerate(seq):
column_list[i].append(nuc)
# process `column_list` here, in the loop (no need to assign to ddd)
tt = ["".join(y for y in x if y in {'A','G','T','C'}) for x in column_list]
results.append(tt)
请注意,您可以使用zip()
功能代替转置列表:
results = []
for sequence in seq_list:
for column_list in zip(*sequence):
tt = [''.join([y for y in x if y in 'AGTC']) for x in column_list]
results.append(tt)
答案 1 :(得分:1)
您想要执行以下操作:
sample = [[['A','T','N','N'],['T', 'C', 'C', 'C']], [['A','T','T','N'],['T', 'T', 'C', 'C']]]
然后:
tt = [[''.join([c for c in sublist if c in 'AGTC']) for sublist in doublet] for doublet in sample]
这样可能更具可读性:
tt = [
[''.join([c for c in sublist if c in 'AGTC'])
for sublist in doublet]
for doublet in sample
]
它给出了期望的结果:
[['AT', 'TCCC'], ['ATT', 'TTCC']]
答案 2 :(得分:1)
您可以先创建辅助函数:
near "as": syntax error:
然后:
def filterJoin(s):
return ''.join(x for x in s if x in 'ATGC')
答案 3 :(得分:0)
我认为你给出的输入是一个列表中的两个项目。然后,您将使用具有2级嵌套的列表理解。在最深层次,您可以过滤掉非A, T, G
或C
以及join
其他项目的项目:
sample = [[['A','T','N','N'],['T', 'C', 'C', 'C']], [['A','T','T','N'],['T', 'T', 'C', 'C']]]
result = [[''.join(i for i in lst if i in 'ATGC') for lst in sub] for sub in sample]
# ^<- filter items
# ^<- join the matching items
print(result)
# [['AT', 'TCCC'], ['ATT', 'TTCC']]