编辑文本文件

时间:2017-05-29 13:24:51

标签: python

我有一个像输入这样的大文件,每4行对应相同的ID,即以@开头的行。第二行(在@之后)是一系列字符,对于某些ID,我们没有这一行。如果是这种情况,我想删除所有属于同一ID的4行 我也在python中尝试了下面的代码并给出了错误。

输入:

@M00872:361:000000000-D2GK2:1:1101:16003:1351 1:N:0:1
ATCCGGCTCGGAGGA
+
1AA?ADDDADDAGGG
@M00872:361:000000000-D2GK2:1:1101:15326:1352 1:N:0:1
GCGCAGCGGAAGCGTGCTGGG
+
CCCCBCDCCCCCGGEGGGGGG
@M00872:361:000000000-D2GK2:1:1101:16217:1352 1:N:0:1

+

输出:

@M00872:361:000000000-D2GK2:1:1101:16003:1351 1:N:0:1
ATCCGGCTCGGAGGA
+
1AA?ADDDADDAGGG
@M00872:361:000000000-D2GK2:1:1101:15326:1352 1:N:0:1
GCGCAGCGGAAGCGTGCTGGG
+
CCCCBCDCCCCCGGEGGGGGG


import fileinput

with fileinput.input(files="4415_pool.fastq", inplace=True, backup="file.bak") as f:
    for l in f:
        if l.strip().startswith("@"):
            c = 2
            next_line = f.readline().strip()  
            if not next_line:   
                while c:        
                    c -= 1
                    try:
                        next(f)
                    except StopIteration:
                        break
            else:
                print(l.strip())
                print(next_line.strip())
                while c:
                    c -= 1
                    try:
                        print(next(f).strip())
                    except StopIteration:
                        break

但没有奏效并发出此错误:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: FileInput instance has no attribute '__exit__'
你知道如何解决这个问题吗?

4 个答案:

答案 0 :(得分:2)

如果你想在fileinput.FileInput语句中使用它,__exit__()类似乎没有实现with fileinput.input()..方法。

答案 1 :(得分:1)

我认为问题是python版本(2.7)不支持fileinput到with

使用

f = fileinput.input(files="4415_pool.fastq", inplace=True, backup="file.bak")

相反

with fileinput.input(files="4415_pool.fastq", inplace=True, backup="file.bak") as f

答案 2 :(得分:1)

尽管在2.5中添加了语句,但我认为fileinput没有被移植到使用它(contextlib?)。

您的代码将在python3中运行,但在2.7中不运行。要解决此问题,请使用py3或移植代码来迭代以下行:

   with open(filename, "r") as f:
         lines = f.readlines()

   for line in lines: 
        #do whatever you need to do for each line. 

答案 3 :(得分:0)

作为你问题的解决方案(在2.7中),我会做类似的事情:

# Read all the lines in a buffer
with open('input.fastq', 'r') as source:
  source_buff = iter(source.readlines())

with open('output.fastq', 'w') as out_file:
  for line in source_buff:
    if line.strip().startswith('@'):
      prev_line = line
      line = next(source_buff)

      if line.strip():
        # if the 2nd line is not empty write the whole block in the output file
        out_file.write(prev_line)
        out_file.write(line)
        out_file.write(next(source_buff))
        out_file.write(next(source_buff))
      else:
        pass

我知道.fastq文件有时会非常大,所以我建议不要在缓冲区中读取整个文件,而是将这段代码放在一个循环中,在这个循环中你读取4行(或者你的块的行数是多少)时间。