包括对匹配文本的顺序编号

时间:2017-10-18 15:52:00

标签: search replace grep

我有一个目前看起来像这样的文件,例如:

>ENSOFAS001369_p |design:coreoidea-v1,designer:forthman,probes-locus:ENSOFAS001369,probes-probe:,probes-source:Anoplocnemis_curvipes_contig204
>ENSOFAS001369_p |design:coreoidea-v1,designer:forthman,probes-locus:ENSOFAS001369,probes-probe:,probes-source:Anoplocnemis_curvipes_contig204
>ENSOFAS001264_p |design:coreoidea-v1,designer:forthman,probes-locus:ENSOFAS001264,probes-probe:,probes-source:Anoplocnemis_curvipes_contig6359
>ENSOFAS001264_p |design:coreoidea-v1,designer:forthman,probes-locus:ENSOFAS001264,probes-probe:,probes-source:Anoplocnemis_curvipes_contig6359
>ENSOFAS001369_p |design:coreoidea-v1,designer:forthman,probes-locus:ENSOFAS001369,probes-probe:,probes-source:Alydus_pilosus_contig594
>ENSOFAS004569_p |design:coreoidea-v1,designer:forthman,probes-locus:ENSOFAS004569,probes-probe:,probes-source:Alydus_pilosus_contig30341

关注>|符号之间的文字部分,我想根据匹配的ENSOFAS数字ID添加顺序编号。也就是说,我想采取这一点,并将其做到这一点:

>ENSOFAS001369_p1 |design:coreoidea-v1,designer:forthman,probes-locus:ENSOFAS001369,probes-probe:,probes-source:Anoplocnemis_curvipes_contig204
>ENSOFAS001369_p2 |design:coreoidea-v1,designer:forthman,probes-locus:ENSOFAS001369,probes-probe:,probes-source:Anoplocnemis_curvipes_contig204
>ENSOFAS001264_p1 |design:coreoidea-v1,designer:forthman,probes-locus:ENSOFAS001264,probes-probe:,probes-source:Anoplocnemis_curvipes_contig6359
>ENSOFAS001264_p2 |design:coreoidea-v1,designer:forthman,probes-locus:ENSOFAS001264,probes-probe:,probes-source:Anoplocnemis_curvipes_contig6359
>ENSOFAS001369_p3 |design:coreoidea-v1,designer:forthman,probes-locus:ENSOFAS001369,probes-probe:,probes-source:Alydus_pilosus_contig594
>ENSOFAS004569_p1 |design:coreoidea-v1,designer:forthman,probes-locus:ENSOFAS004569,probes-probe:,probes-source:Alydus_pilosus_contig30341

我可以在textwrangler(> ENSOFAS(\ d +)_ p(。+)\ r)中进行搜索,但我知道文本编辑器无法在{{1}之后添加数字方面做我需要的工作}。我认为搜索部分的macOS linux版本可能是_p,但不知道如何在grep -E ">ENSOFAS[0-9]\{6\}_p\s|"_p之前的空格之间进行编号。匹配ENSOFAS数字不会在文本文件中聚集在一起,但如果需要,我可以采用某种排序方式。

2 个答案:

答案 0 :(得分:0)

如果您的设置中有awk选项:

$ awk '{cnt[$1]++; $1=$1""cnt[$1]; print}' file
>ENSOFAS001369_p1 |design:coreoidea-v1,designer:forthman,probes-locus:ENSOFAS001369,probes-probe:,probes-source:Anoplocnemis_curvipes_contig204
>ENSOFAS001369_p2 |design:coreoidea-v1,designer:forthman,probes-locus:ENSOFAS001369,probes-probe:,probes-source:Anoplocnemis_curvipes_contig204
>ENSOFAS001264_p1 |design:coreoidea-v1,designer:forthman,probes-locus:ENSOFAS001264,probes-probe:,probes-source:Anoplocnemis_curvipes_contig6359
>ENSOFAS001264_p2 |design:coreoidea-v1,designer:forthman,probes-locus:ENSOFAS001264,probes-probe:,probes-source:Anoplocnemis_curvipes_contig6359
>ENSOFAS001369_p3 |design:coreoidea-v1,designer:forthman,probes-locus:ENSOFAS001369,probes-probe:,probes-source:Alydus_pilosus_contig594
>ENSOFAS004569_p1 |design:coreoidea-v1,designer:forthman,probes-locus:ENSOFAS004569,probes-probe:,probes-source:Alydus_pilosus_contig30341

说明:$1将包含第一个字段(对于每一行),例如>ENSOFAS001369_p。我们使用关联数组cnt来计算来自$1的每个唯一标记的出现次数,并修改字段$1(先前输出)以包括处理的记录/行的当前计数。 / p>

awk脚本可以缩短,但在这种形式下可能更易读和理解。

答案 1 :(得分:0)

awk 方法:

awk '{ $1=$1""++a[$1] }1' file

输出:

>ENSOFAS001369_p1 |design:coreoidea-v1,designer:forthman,probes-locus:ENSOFAS001369,probes-probe:,probes-source:Anoplocnemis_curvipes_contig204
>ENSOFAS001369_p2 |design:coreoidea-v1,designer:forthman,probes-locus:ENSOFAS001369,probes-probe:,probes-source:Anoplocnemis_curvipes_contig204
>ENSOFAS001264_p1 |design:coreoidea-v1,designer:forthman,probes-locus:ENSOFAS001264,probes-probe:,probes-source:Anoplocnemis_curvipes_contig6359
>ENSOFAS001264_p2 |design:coreoidea-v1,designer:forthman,probes-locus:ENSOFAS001264,probes-probe:,probes-source:Anoplocnemis_curvipes_contig6359
>ENSOFAS001369_p3 |design:coreoidea-v1,designer:forthman,probes-locus:ENSOFAS001369,probes-probe:,probes-source:Alydus_pilosus_contig594
>ENSOFAS004569_p1 |design:coreoidea-v1,designer:forthman,probes-locus:ENSOFAS004569,probes-probe:,probes-source:Alydus_pilosus_contig30341

使用 awk sub()函数的替代方法:

awk '{ sub(/$/,++a[$1],$1) }1' file