Question

对于输入文件中的每一行，我想打印字符串'locus_tag ='的字段，如果没有字段匹配，则打印短划线。

输入文件（制表符分隔）：

GeneID_2=7277058    location=890211..892127 locus_tag=HAPS_0907 orientation=+
GeneID_2=7278144    gene=rlmL   location=complement(1992599..1994776)   locus_tag=HAPS_2029
GeneID_2=7278145    gene=rlmT   location=complement(1992599..1994776)   timetoparse

期望的输出：

locus_tag=HAPS_0907
locus_tag=HAPS_2029
-

试过这个但没有工作：

awk -F'\t' '{ for(i=1; i<=NF; i++) if($i ~/locus_tag=/) {print $i}; {for(i=1; i<=NF; i++) if($i !=/locus_tag=/) {print "-"}} }' SNP_annotations_ON_PROTEIN

Answer 1

perl -lpe '($_)= (/(locus_tag=\S+)/, "-")' file

输出

locus_tag=HAPS_0907
locus_tag=HAPS_2029
-

Answer 2

 perl -nE 'say m/(locus_tag=\S*)/ ? $1 : q/-/'

Answer 3

你非常接近：

$ awk -F'\t' '{ for(i=1; i<=NF; i++) if($i ~/locus_tag=/) {print $i; next} {print "-"}}' a
GeneID_2=7277058    location=890211..892127 locus_tag=HAPS_0907 orientation=+
GeneID_2=7278144    gene=rlmL   location=complement(1992599..1994776)   locus_tag=HAPS_2029
-

你有什么：

{ for(i=1; i<=NF; i++) if($i ~/locus_tag=/) {print $i}; {for(i=1; i<=NF; i++) if($i !=/locus_tag=/) {print "-"}} }'

我写的：

{ for(i=1; i<=NF; i++) if($i ~/locus_tag=/) {print $i; next} {print "-"}}
                                                       ^^^^  ^^^^^^^^^^^
                        if found, print and go to next line        |
    if you arrive here, it is because you did not find the pattern, so print dash

Answer 4

使用awk：

awk '/locus_tag/{for(x=1;x<=NF;x++) if($x~/^locus_tag=/) print $x;next}{print "-"}' file

Answer 5

您可以使用FS进行游戏，以便更轻松：

awk -F'locus_tag=' 'NF>1{sub(/\s.*/,"",$2);print FS $2;next}$0="-"' f  
locus_tag=HAPS_0907
locus_tag=HAPS_2029
-

Answer 6

使用perl：

perl -ne 'print /(locus_tag=.*?)\s/?"$1\n":"-\n"' file
locus_tag=HAPS_0907
locus_tag=HAPS_2029
-

Answer 7

$ awk '{print (match($0,/locus_tag=[^[:space:]]*/) ? substr($0,RSTART,RLENGTH) : "-")}' file
locus_tag=HAPS_0907
locus_tag=HAPS_2029
-

解析简单表

7 个答案: