awk打印使用条件匹配的字段和两个文件中不匹配的默认值

时间:2017-03-17 12:51:34

标签: awk

尝试使用AWK将file中每一行的内容与$2中的list进行匹配。这两个文件都以制表符分隔,并且list中匹配的名称中可能有空格或特殊字符,例如file中的名称为BRCA1list名称为BRCA 1file名称为BCR,但list的名称为BCR/ABL

如果匹配且$4 list中有full gene sequence,则$2 and $1将以制表符分隔打印。如果找不到匹配项,则表示未匹配的名称和14由选项卡分隔打印。下面的awk确实执行,但没有输出结果。谢谢你:)。

文件

BRCA1
BCR
SCN1A
fbn1

列表

List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence

AWK

awk -F'\t' -v OFS="\t" 'FNR==NR{A[$1]=$0;next} ($2 in A){if($4=="full gene sequence"){print A[$2],$1}} ELSE {print A[$2],"14"}' file list

所需的输出

BRCA1   811
BCR 71
SCN1A   14
fbn1     85

修改

List code   gene    gene name   methodology
85  fbn1    Fibrillin   full gene sequencing
95  FBN1    fibrillin   del/dup

结果

85  fbn1    Fibrillin   full gene sequencing

因为只有这一行中有full gene sequencing,所以只打印出来。

2 个答案:

答案 0 :(得分:1)

awk 'FNR==NR{
          a[$2]=$1;
          next
      }
     {
       for(i in a){ 
           if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
       } 
        print $1,14 
     }'  list file

<强>输入

$ cat list 
List code   gene    gene name   methodology
81  DMD dystrophin  deletion analysis and duplication analysis
811 BRCA 1   BRCA2  full gene sequence and full deletion/duplication analysis
70  ABL1    ABL1    gene analysis variants in the kinse domane
71  BCR/ABL t(9;22) full gene sequence

$ cat file 
BRCA1
BCR
SCN1A

<强>输出

$ awk 'FNR==NR{
          a[$2]=$1;
          next
      }
     {
       for(i in a){ 
           if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
       } 
        print $1,14 
     }'  list file
BRCA1 811
BCR 71
SCN1A 14

答案 1 :(得分:1)

你可以尝试,

awk 'BEGIN{FS=OFS="\t"}
FNR==NR{
    if(NR>1){
        gsub(" ","",$2)       #removing white space
        n=split($2,v,"/")
        d[v[1]] = $1          #from split, first element as key
    } 
    next
}{print $1, ($1 in d?d[$1]:14)}' list file

你明白了,

BRCA1   811
BCR 71
SCN1A   14