对于输入文件中的每一行,我想打印字符串'locus_tag ='的字段,如果没有字段匹配,则打印短划线。
输入文件(制表符分隔):
GeneID_2=7277058 location=890211..892127 locus_tag=HAPS_0907 orientation=+
GeneID_2=7278144 gene=rlmL location=complement(1992599..1994776) locus_tag=HAPS_2029
GeneID_2=7278145 gene=rlmT location=complement(1992599..1994776) timetoparse
期望的输出:
locus_tag=HAPS_0907
locus_tag=HAPS_2029
-
试过这个但没有工作:
awk -F'\t' '{ for(i=1; i<=NF; i++) if($i ~/locus_tag=/) {print $i}; {for(i=1; i<=NF; i++) if($i !=/locus_tag=/) {print "-"}} }' SNP_annotations_ON_PROTEIN
答案 0 :(得分:6)
perl -lpe '($_)= (/(locus_tag=\S+)/, "-")' file
输出
locus_tag=HAPS_0907
locus_tag=HAPS_2029
-
答案 1 :(得分:2)
perl -nE 'say m/(locus_tag=\S*)/ ? $1 : q/-/'
答案 2 :(得分:1)
你非常接近:
$ awk -F'\t' '{ for(i=1; i<=NF; i++) if($i ~/locus_tag=/) {print $i; next} {print "-"}}' a
GeneID_2=7277058 location=890211..892127 locus_tag=HAPS_0907 orientation=+
GeneID_2=7278144 gene=rlmL location=complement(1992599..1994776) locus_tag=HAPS_2029
-
你有什么:
{ for(i=1; i<=NF; i++) if($i ~/locus_tag=/) {print $i}; {for(i=1; i<=NF; i++) if($i !=/locus_tag=/) {print "-"}} }'
我写的:
{ for(i=1; i<=NF; i++) if($i ~/locus_tag=/) {print $i; next} {print "-"}}
^^^^ ^^^^^^^^^^^
if found, print and go to next line |
if you arrive here, it is because you did not find the pattern, so print dash
答案 3 :(得分:1)
使用awk
:
awk '/locus_tag/{for(x=1;x<=NF;x++) if($x~/^locus_tag=/) print $x;next}{print "-"}' file
答案 4 :(得分:1)
您可以使用FS
进行游戏,以便更轻松:
awk -F'locus_tag=' 'NF>1{sub(/\s.*/,"",$2);print FS $2;next}$0="-"' f
locus_tag=HAPS_0907
locus_tag=HAPS_2029
-
答案 5 :(得分:1)
使用perl
:
perl -ne 'print /(locus_tag=.*?)\s/?"$1\n":"-\n"' file
locus_tag=HAPS_0907
locus_tag=HAPS_2029
-
答案 6 :(得分:1)
$ awk '{print (match($0,/locus_tag=[^[:space:]]*/) ? substr($0,RSTART,RLENGTH) : "-")}' file
locus_tag=HAPS_0907
locus_tag=HAPS_2029
-