Question

我在每一行的开头都有特定的模式。我想删除特定模式而不是python中的完整行。从实际文件中检索后，我的数据看起来像

>homo_seg-Val-abc-1-1
>homo_seg-Beg-cdf-2-1
>homo_seg-Try-gfh-3-2
>homo_seg-Fuss-cdh-3-1

在这里，我想从数据集中删除“> homo_seg-”，并仅保留以下内容

Val-abc-1-1
Beg-cdf-2-1
Try-gfh-3-2
Fuss-cdh-3-1

我可以在perl中做到这一点

$new =~s/homo_seg-//g;

我的代码是：

import sys
inFile = sys.argv[1]
with open(inFile) as fasta:
    for line in fasta:
        if line.startswith('>'):
            header = line.split()
            t = header[0]

        import re  # from below answer

        regex = r">homo_seg-"

        subst = ""

        result = re.sub(regex, subst, t, 0, re.MULTILINE)
        print(result)

此代码仅给出最后一行的输出。我知道它有一些小错误，但无法接收。

Answer 1

尝试一下：

new_line = old_line[9:]

或者如果您想更加安全：

if old_line.startswith('homo_seg-') :
    new_line = old_line[9:]

Answer 2

您可以检查https://regex101.com/r/hvFquS/1

 import re

 regex = r"homo_seg-"

 test_str = ("homo_seg-Val-abc-1-1\n"
    "homo_seg-Beg-cdf-2-1\n"
    "homo_seg-Try-gfh-3-2\n"
    "homo_seg-Fuss-cdh-3-1")

 subst = ""

 result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

 if result:
     print (result)

搜索特定模式并在Python的一行中删除该模式

2 个答案: