我有一个fastq文件,其中每个条目'是4行(两行在' +'之前)。我如何将每组4行读入单个列表元素?
该文件如下:
@DQNZZQ1:756:C3K7PACXX:6:1101:2383:2061 1:N:0:CCGTCC
GAACCCCACTGTGCACCACCTGTCTCTTATACACATCTAGATGTGTATAAGAGACAGAGATGGGGGCGACGACATTTTTGCAGCTGATGCTAAACGCGGA
+
@@CFFFFFHHHDHJJJIIJJJIHHGGGG<E@C9CDFHG>ABFGGADFHGIGEHCHHGEEC:GHGEH/8=?@99554>CC5CDCCDD=CD44>C@>@@DD@
@DQNZZQ1:756:C3K7PACXX:6:1101:2486:2062 1:N:0:CCGTCC
GCCCAAGACGGCCCCCGCTCCGCGTCGGTTCATCGGTTCCTCGGGGCAAGGATGTTCCCAGGTTGTTTGTGAGGAGAGTGTCTCTTTTTCACATCTTGTG
+
@@@DDDDDFFFFFIIIE8?FG)6@############################################################################
@DQNZZQ1:756:C3K7PACXX:6:1101:2359:2093 1:N:0:CCGTCC
TAAGATATTGGCAAGCAATATAGCTTTCTTCACGCGCCACACAGTTTCCCGGCTGTAGCGGTGACGACGGGGCAGACGGTGGAGGTGTTTCCTGCAGACT
+
@@@?DDFBFHGFD<@GGHCEHFCDHIHGHIIIIIFGIIGEFHGFD@DHFHBEBHGAC3)-99>?ABBB=@&5>;5889B0<<???8848<@@########
@DQNZZQ1:756:C3K7PACXX:6:1101:2319:2168 1:N:0:CCGTCC
AAGTTTAATAAGCAAACCCTGGGAACTGCGACGGTCTTCGGCACTGTCTACAAATGACGCGTCACAGAAGACCTCTAAACCTCGATCCAGTTATCGCTGT
+
==@4:BDBDBB?8AFGHIEHHIII;F3?1?FF?F0????C@FA;DEEGHEC;?=CADCB=A/3'5:@A>?CCC:>@A:49?A<B5>??CCA>>+>18?##
@DQNZZQ1:756:C3K7PACXX:6:1101:2337:2170 1:N:0:CCGTCC
GGCGACTGTGTTTGCCAAGATGGAGCGCGACCTGCGGCGGCCGGGTGCCGTGTTTGCCGAGGCGGGCGCACCCGCCCGCTGGGAGACGGGCCCCAACTAG
+
;=?DD:::DFCCCFGIGIIGGIBCHIIIID@GHIIIBEB>B@B@-)5??B05?AC9>AB5<77@####################################
到目前为止我已经到了:
forward = open(sys.argv[1],'r')
reverse = open(sys.argv[2],'r')
output = open(sys.argv[3],'r')
for reads in forward:
freads_full = islice(forward, 4)
for line in freads_full:
flist = line
谢谢你的帮助!
答案 0 :(得分:0)
itertools
docs中有一个名为grouper
的食谱可以完全符合您的要求:
def grouper(iterable, n, fillvalue=None): "Collect data into fixed-length chunks or blocks" # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx" args = [iter(iterable)] * n return zip_longest(*args, fillvalue=fillvalue)
您可以执行类似
的操作with open(sys.argv[1]) as forward:
for batch in grouper(forward, 4):
# do stuff with the iterator
您可以选择使用线条进行操作。例如,如果要连接它们,可以执行
''.join(batch)
甚至
sum(batch)