awk使用另一个的内容更新out文件的字段中的值

时间:2017-04-03 18:57:26

标签: awk

在下面的out.txt中,我尝试使用awk来更新$9的内容。 out.txt由管道awk之前的|创建。如果$9包含+-,那么$8 out.txt $2被用作file2 $3 file2中的查找键。当找到匹配项(总会有一个匹配项)时,$9的{​​{1}}值用于更新由out.txt分隔的: +6。因此out.txt中的原始+6:NM_005101.3将为awk。下面的|已关闭,但tab-delimited之后的语法错误似乎无法解决。谢谢你:)。

out.txt R_Index Chr Start End Ref Alt Func.IDP.refGene Gene.IDP.refGene GeneDetail.IDP.refGene Inheritence ExonicFunc.IDP.refGene AAChange.IDP.refGene 1 chr1 948846 948846 - A upstream ISG15 -0 . . . 2 chr1 948870 948870 C G UTR5 ISG15 NM_005101.3:c.-84C>G . . 4 chr1 949925 949925 C T downstream ISG15 +6 . . . 5 chr1 207646923 207646923 G A intronic CR2 >50 . . . 8 chr1 948840 948840 - C upstream ISG15 -6 . . .

space-delimited

file2 2 ISG15 NM_005101.3 948846-948956 949363-949919

R_Index Chr Start   End Ref Alt Func.IDP.refGene    Gene.IDP.refGene    GeneDetail.IDP.refGene  Inheritence ExonicFunc.IDP.refGene  AAChange.IDP.refGene
1   chr1    948846  948846  -   A   upstream    ISG15   -0:NM_005101.3  .   .   .
2   chr1    948870  948870  C   G   UTR5    ISG15   NM_005101.3:c.-84C>G    .   .
4   chr1    949925  949925  C   T   downstream  ISG15   +6:NM_005101.3  .   .   .
5   chr1    207646923   207646923   G   A   intronic    CR2 >50 .   .   .
8   chr1    948840  948840  -   C   upstream    ISG15   -6:NM_005101.3  .   .   .

所需的输出`制表符分隔'

lines 1, 3, 5 `$9` updated with`: ` and value of `$3` in `file2`
line 2 and 4 are skipped as these do not have a `+` or `-` in them 

描述

awk -v extra=50 -v OFS='\t' '
NR == FNR {
count[$2] = $1
for(i = 1; i <= $1; i++) {
low[$2, i] = $(2 + 2 * i)
high[$2, i] = $(3 + 2 * i)
mid[$2, i] = (low[$2, i] + high[$2, i]) / 2
}
next
}
    FNR != 1 && $9 == "." && $12 == "." && $8 in count {
    for(i = 1; i <= count[$8]; i++)
    if($4 >= (low[$8, i] - extra) && $4 <= (high[$8, i] + extra)) {
    if($4 > mid[$8, i]) {
    sign = "+"
    value = high[$8, i]
} 
    else {
    sign = "-"
    value = low[$8, i]
}
    diff = (value > $4) ? value - $4 : $4 - value
    $9 = (diff > 50) ? ">50" : (sign diff)
    break
}
   if(i > count[$8]) {
   $9 = ">50"
}
   }
   1
   ' FS='[- ]' file2 FS='\t' file1 | awk if($6 == "-" || $6 == "+") printf ":" ; 'FNR==NR {a[$2]=$3; next} a[$8]{$3=a[$8]}1' OFS='\t' file2 > final.txt
bash: syntax error near unexpected token `('

AWK

core-site.xml

1 个答案:

答案 0 :(得分:1)

据我所知,您的awk代码没问题,而且您的bash使用错误。

FS='[- ]' file2 FS='\t' file1 |
  awk if($6 == "-" || $6 == "+")
      printf ":" ;
  'FNR==NR {a[$2]=$3; next}
   a[$8]{$3=a[$8]}1' OFS='\t' file2 > final.txt
bash: syntax error near unexpected token `('

我不知道应该做什么。但这肯定是:在第二行,需要引用awk代码(awk 'if(....)。 bash错误消息源于bash正在解释(未引用的)awk代码,而(if之后不是有效的shell脚本令牌。