Question

下面的awk将产生tab-delimeted file1，其中$3-$2之间的差异是针对每一行计算并以$6打印的。在执行awk之前，仅存在5个字段。

如果$2的值为{{1}，则file2的每个$7的值为file1的更新，我遇到的问题}与$1中的file2匹配，并且$5中的file1不是$6。如果file1的值为intron，则$5中intron的值为零。因此，例如$7中的第一行是file1，因此等于零或被跳过（在计算中不需要这些行）。

在file1中intron中的$1值可能不存在，在这种情况下，在file2中的file1的值为零。 $2中的Line3是一个示例，并且由于file2中不存在，因此被设置为零。谢谢：）。

Awk w / output

file2

输出

file1

文件1 awk ' FNR==NR{ # process same line b[$4]=$3-$2; next # process next line } { a[$5]+=($3-$2) } { split($1, b, " "); print b[0], a[b[0]] }' OFS="\t" file1 file2

文件2 tab-delimited

chr5    86667863    86667879    RASA1   intron  16
chr5    86669977    86669995    RASA1   splicing    18
chr5    86670703    86670805    RASA1   exon    102
chr5    86679453    86679547    RASA1   intron  94
chr5    86679571    86679673    RASA1   exon    102
chr19   15088950    15088961    NOTCH2  intron  50
chr19   15288950    15288961    NOTCH3  intron  11
chr19   15308240    15308275    NOTCH3  exon    35

需要 space delimited

RASA1 2135
NOTCH2 0
GIMAP8 87
NOTCH3 129
FOXF2 0
PRB3 63

也许在第一个after file2 is updated之后添加RASA1 222 `(102+102+18)` NOTCH2 0 GIMAP8 0 NOTCH3 35 `(35)` FOXF2 0 PRB3 0，

要更新awk

Answer 1

能否请您尝试以下。它将为您提供与Input_file顺序相同的输出顺序。

awk '
FNR==NR{
  if(!b[$1]++){
     c[++count]=$1
  }
  a[$1]
  next
}
($4 in a) && $5!="intron"{
  a[$4]+=$NF
}
END{
  for(i=1;i<=count;i++){
    print c[i],a[c[i]]?a[c[i]]:0
  }
}'  Input_file2   Input_file1

由于您的Input_file1并非根据您的声明进行制表符分隔，因此，在这种情况下，请编辑Input_file2 Input_file1 -----> Input_file2 FS="\t" Input_file1。要获得以TAB分隔的输出，请将上述代码的输出附加到| column -t命令中，或者也将OFS="\t"设置为靠近FS="\t"。

输出如下。

RASA1 222
NOTCH2 0
GIMAP8 0
NOTCH3 35
FOXF2 0
PRB3 0

Answer 2

如果我理解正确，这应该可以完成您期望的事情

$ awk 'FNR==NR && $5!="intron" {a[$4]+=$3-$2; next}
       {$2=($1 in a)?a[$1]:0}1' file1 file2 > file2.updated

AWK根据另一个中的匹配和条件更新文件

2 个答案: