我有一个像输入这样的大文件,每4行对应相同的ID,即以@开头的行。第二行(在@之后)是一系列字符,对于某些ID,我们没有这一行。如果是这种情况,我想删除所有属于同一ID的4行 我也在python中尝试了下面的代码并给出了错误。
输入:
@M00872:361:000000000-D2GK2:1:1101:16003:1351 1:N:0:1
ATCCGGCTCGGAGGA
+
1AA?ADDDADDAGGG
@M00872:361:000000000-D2GK2:1:1101:15326:1352 1:N:0:1
GCGCAGCGGAAGCGTGCTGGG
+
CCCCBCDCCCCCGGEGGGGGG
@M00872:361:000000000-D2GK2:1:1101:16217:1352 1:N:0:1
+
输出:
@M00872:361:000000000-D2GK2:1:1101:16003:1351 1:N:0:1
ATCCGGCTCGGAGGA
+
1AA?ADDDADDAGGG
@M00872:361:000000000-D2GK2:1:1101:15326:1352 1:N:0:1
GCGCAGCGGAAGCGTGCTGGG
+
CCCCBCDCCCCCGGEGGGGGG
import fileinput
with fileinput.input(files="4415_pool.fastq", inplace=True, backup="file.bak") as f:
for l in f:
if l.strip().startswith("@"):
c = 2
next_line = f.readline().strip()
if not next_line:
while c:
c -= 1
try:
next(f)
except StopIteration:
break
else:
print(l.strip())
print(next_line.strip())
while c:
c -= 1
try:
print(next(f).strip())
except StopIteration:
break
但没有奏效并发出此错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: FileInput instance has no attribute '__exit__'
你知道如何解决这个问题吗?
答案 0 :(得分:2)
如果你想在fileinput.FileInput
语句中使用它,__exit__()
类似乎没有实现with fileinput.input()..
方法。
答案 1 :(得分:1)
我认为问题是python版本(2.7)不支持fileinput到with
使用
f = fileinput.input(files="4415_pool.fastq", inplace=True, backup="file.bak")
相反
with fileinput.input(files="4415_pool.fastq", inplace=True, backup="file.bak") as f
答案 2 :(得分:1)
尽管在2.5中添加了语句,但我认为fileinput没有被移植到使用它(contextlib?)。
您的代码将在python3中运行,但在2.7中不运行。要解决此问题,请使用py3或移植代码来迭代以下行:
with open(filename, "r") as f:
lines = f.readlines()
for line in lines:
#do whatever you need to do for each line.
答案 3 :(得分:0)
作为你问题的解决方案(在2.7中),我会做类似的事情:
# Read all the lines in a buffer
with open('input.fastq', 'r') as source:
source_buff = iter(source.readlines())
with open('output.fastq', 'w') as out_file:
for line in source_buff:
if line.strip().startswith('@'):
prev_line = line
line = next(source_buff)
if line.strip():
# if the 2nd line is not empty write the whole block in the output file
out_file.write(prev_line)
out_file.write(line)
out_file.write(next(source_buff))
out_file.write(next(source_buff))
else:
pass
我知道.fastq文件有时会非常大,所以我建议不要在缓冲区中读取整个文件,而是将这段代码放在一个循环中,在这个循环中你读取4行(或者你的块的行数是多少)时间。