Question

我正在尝试修改我的.fasta文件：

>YP_009208724.1 hypothetical protein ADP65_00072 [Achromobacter phage phiAxp-3]
MSNVLLKQ...

>YP_009220341.1 terminase large subunit [Achromobacter phage phiAxp-1]
MRTPSKSE...

>YP_009226430.1 DNA packaging protein [Achromobacter phage phiAxp-2]
MMNSDAVI...

到此：

>Achromobacter phage phiAxp-3
MSNVLLKQ...

>Achromobacter phage phiAxp-1
MRTPSKSE...

>Achromobacter phage phiAxp-2
MMNSDAVI...

现在，我已经有了一个可以将它写入单个文件的脚本：

with open('Achromobacter.fasta', 'r') as fasta_file:
    out_file = open('./fastas3/Achromobacter.fasta', 'w')
    for line in fasta_file:
        line = line.rstrip()
        if '[' in line:
            line = line.split('[')[-1]
            out_file.write('>' + line[:-1] + "\n")
        else:
            out_file.write(str(line) + "\n")

但我无法自动完成文件夹中所有120个文件的处理过程。

我尝试使用glob.glob，但我似乎无法使其工作：

import glob

for fasta_file in glob.glob('*.fasta'):
    outfile = open('./fastas3/'+fasta_file, 'w')
    with open(fasta_file, 'r'):
        for line in fasta_file:
            line = line.rstrip()
            if '[' in line:
                line2 = line.split('[')[-1]
                outfile.write('>' + line2[:-1] + "\n")
            else:
                outfile.write(str(line) + "\n")

它给了我这个输出：

A
c
i
n
e
t
o
b
a
c
t
e
r
.
f
a
s
t
a

我设法获取该文件夹中所有文件的列表，但无法使用列表中的对象打开某些文件。

import os


file_list = []
for file in os.listdir("./fastas2/"):
    if file.endswith(".fasta"):
        file_list.append(file)

Answer 1

考虑到您现在可以更改文件名的内容，您需要自动执行该过程。我们通过删除两次用于打开文件的文件处理程序来更改一个文件的功能。

def file_changer(filename):
    data_to_put = ''
    with open(filename, 'r+') as fasta_file:
        for line in fasta_file.readlines():
            line = line.rstrip()
            if '[' in line:
                line = line.split('[')[-1]
                data_to_put += '>' + str(line[:-1]) + "\n"
            else:
                data_to_put += str(line) + "\n"
        fasta_file.write(data_to_put) 
        fasta_file.close()

现在我们需要遍历所有文件。因此，让我们使用glob模块

import glob
for file in glob.glob('*.fasta'):
    file_changer(file)

Answer 2

您正在迭代文件名，它会为您提供名称中的所有字符，而不是文件的行。以下是代码的更正版本：

import glob

for fasta_file_name in glob.glob('*.fasta'):
    with open(fasta_file_name, 'r') as fasta_file, \
            open('./fastas3/' + fasta_file_name, 'w') as outfile:
        for line in fasta_file:
            line = line.rstrip()
            if '[' in line:
                line2 = line.split('[')[-1]
                outfile.write('>' + line2[:-1] + "\n")
            else:
                outfile.write(str(line) + "\n")

作为Python脚本的替代方法，您只需使用命令行中的sed：

sed -i 's/^>.*\[\(.*\)\].*$/>\1/' *.fasta

这将修改所有文件，因此请考虑先复制它们。

使用python打开和编辑文件夹中的多个文件

2 个答案: