Question

我有一个 sed 命令，它应该在 linux 上的 python 代码中运行（使用 os.system() ）或转换为 python 代码。但我不知道这个 sed 命令究竟是做什么的。如果你给我代码或帮助我如何在 python 中使用 os.system 实现它，我将不胜感激，因为我在使用 os.system 时遇到了很多错误。

sed -n '1~4s/^@/>/p;2~4p' file1.fastq > file1.fasta

顺便说一句，输入和输出文件应该在我的python代码中动态定义：

seq_file1 = '6448.fastq'
input_file1 = os.path.join(sys.path[0],seq_file1)
os.system(os.path.join("sed -n '1~4s/^@/>/p;2~4p' "+ seq_file1 + ' > ' + os.path.splitext(os.path.basename(input_file1))[0]+".fasta") , shell = True)

Answer 1

这个 sed 命令到底有什么作用？

此 sed 命令在此文件中同时运行两个不同的操作。

-n：抑制整个文件的输出。仅打印应用指令 p 的行。

1~4：从第 1 行开始，每 4 行应用下一条指令。

s/^@/>/p：用 @ 替换每个前导 > 并打印结果。由于上述指令，从第 1 行开始，每 4 行应用此指令。

; 操作分隔符。

2~4：从第 2 行开始每 4 行应用下一条指令。

p：打印一行。

这是什么意思：“在从 #1 开始的每 4 行中用 @ 替换前导 >，并从 #2 开始每 4 行打印一次”

示例：

file1.fastq 的内容：

@ line 1
@ line 2
@ line 3
@ line 4
@ line 5
@ line 6
@ line 7
@ line 8
@ line 9
@ line 10
@ line 11
@ line 12

运行sed -n '1~4s/^@/>/p;2~4p' file1.fastq > file1.fasta

file1.fasta 的内容

> line 1
@ line 2
> line 5
@ line 6
> line 9
@ line 10

一个很好的参考是：http://www.gnu.org/software/sed/manual/sed.html

如何在 Python 中做同样的事情？

以下代码片段旨在讲授，因此我避免使用许多 Python 语言资源，这些资源可用于改进算法。

我测试了几次，它对我有用。

# import Regular Expressions module
import re

output = []

# Open the input file in read mode
with open('file1.fastq', 'r') as file_in:
    replace_step = 1 # replacement starts in line #1
    print_step = 0   # print function starts in line #2 so it bypass one step
    for i, line in enumerate(file_in):
        if replace_step == 1:
            output.append(re.sub('^@', '>', line))                        
        if replace_step >= 4:
            replace_step = 1
        else:
            replace_step += 1            

        if print_step == 1:
            output.append(line)
        if print_step >= 4:
            print_step = 1
        else:   
            print_step +=1

    print("".join(output))
    

# Open the output file in write mode
with open('file1.fasta', 'w') as file_out:
    file_out.write("".join(output))

Answer 2

您也可以使用subprocess.run：

import subprocess
 
seq_file_in = '6448.fastq'
seq_file_out = '6448_out.fastq'
with open(seq_file_out, 'w') as fw:
    subprocess.run(["sed", r"1~4s/^@/>/p;2~4p", seq_file_in], stdout=fw)

在这种情况下，当 sed 命令如此简洁明了时，subprocess.run 可能会变得非常方便。

将 SED 命令转换为 linux 命令

2 个答案:

这个 sed 命令到底有什么作用？

示例：

如何在 Python 中做同样的事情？