解析简单表

时间:2014-02-19 16:36:36

标签: perl awk

对于输入文件中的每一行,我想打印字符串'locus_tag ='的字段,如果没有字段匹配,则打印短划线。

输入文件(制表符分隔):

GeneID_2=7277058    location=890211..892127 locus_tag=HAPS_0907 orientation=+
GeneID_2=7278144    gene=rlmL   location=complement(1992599..1994776)   locus_tag=HAPS_2029
GeneID_2=7278145    gene=rlmT   location=complement(1992599..1994776)   timetoparse

期望的输出:

locus_tag=HAPS_0907
locus_tag=HAPS_2029
-

试过这个但没有工作:

awk -F'\t' '{ for(i=1; i<=NF; i++) if($i ~/locus_tag=/) {print $i}; {for(i=1; i<=NF; i++) if($i !=/locus_tag=/) {print "-"}} }' SNP_annotations_ON_PROTEIN

7 个答案:

答案 0 :(得分:6)

perl -lpe '($_)= (/(locus_tag=\S+)/, "-")' file

输出

locus_tag=HAPS_0907
locus_tag=HAPS_2029
-

答案 1 :(得分:2)

 perl -nE 'say m/(locus_tag=\S*)/ ? $1 : q/-/'

答案 2 :(得分:1)

你非常接近:

$ awk -F'\t' '{ for(i=1; i<=NF; i++) if($i ~/locus_tag=/) {print $i; next} {print "-"}}' a
GeneID_2=7277058    location=890211..892127 locus_tag=HAPS_0907 orientation=+
GeneID_2=7278144    gene=rlmL   location=complement(1992599..1994776)   locus_tag=HAPS_2029
-

你有什么:

{ for(i=1; i<=NF; i++) if($i ~/locus_tag=/) {print $i}; {for(i=1; i<=NF; i++) if($i !=/locus_tag=/) {print "-"}} }'

我写的:

{ for(i=1; i<=NF; i++) if($i ~/locus_tag=/) {print $i; next} {print "-"}}
                                                       ^^^^  ^^^^^^^^^^^
                        if found, print and go to next line        |
    if you arrive here, it is because you did not find the pattern, so print dash

答案 3 :(得分:1)

使用awk

awk '/locus_tag/{for(x=1;x<=NF;x++) if($x~/^locus_tag=/) print $x;next}{print "-"}' file

答案 4 :(得分:1)

您可以使用FS进行游戏,以便更轻松:

awk -F'locus_tag=' 'NF>1{sub(/\s.*/,"",$2);print FS $2;next}$0="-"' f  
locus_tag=HAPS_0907
locus_tag=HAPS_2029
-

答案 5 :(得分:1)

使用perl

perl -ne 'print /(locus_tag=.*?)\s/?"$1\n":"-\n"' file
locus_tag=HAPS_0907
locus_tag=HAPS_2029
-

答案 6 :(得分:1)

$ awk '{print (match($0,/locus_tag=[^[:space:]]*/) ? substr($0,RSTART,RLENGTH) : "-")}' file
locus_tag=HAPS_0907
locus_tag=HAPS_2029
-