如果“类别”字段不唯一,则将字段添加到awk命令

时间:2016-04-28 16:32:11

标签: awk

我运行了一些awk命令,我需要更改第一个,但似乎无法使语法正确。基本上,list中的ID用于搜索$3的{​​{1}}以及它们是否匹配以及file字段Category($10) = { {1}}然后将(NF-1)reference standardGeneID$2输出到RNA结果。如果只有一个$6,那就有效,但在update中有两个reference standard,所以我需要使用LAMA4reference standards reference standard { {1}}或$10以及最高t#$7。我似乎无法将其纳入下面的p#。我为长篇文章道歉,试图包含所有细节。

列表

$9

文件

awk

AWK

TTR
LAMA4
DSP

当前结果

#tax_id GeneID  Symbol  RSG LRG RNA t   Protein p   Category
9606    7276    TTR NG_009490.1 LRG_416 NM_000371.3 t1  NP_000362.1 p1  reference standard
9606    3910    LAMA4   NG_008209.1 LRG_433 NM_002290.4     NP_002281.3     aligned: Selected
9606    3910    LAMA4   NG_008209.1 LRG_433 NM_001105206.2      NP_001098676.2      aligned: Selected
9606    3910    LAMA4   NG_008209.1 LRG_433 NM_001105207.2      NP_001098677.2      aligned: Selected
9606    3910    LAMA4   NG_008209.1 LRG_433 NM_001105208.2      NP_001098678.1      aligned: Selected
9606    3910    LAMA4   NG_008209.1 LRG_433 NM_001105209.2      NP_001098679.1      aligned: Selected
9606    3910    LAMA4   NG_008209.1 LRG_433 NM_001105208.1  t1  NP_001098678.1  p1  reference standard
9606    3910    LAMA4   NG_008209.1 LRG_433 NM_002290.3 t2  NP_002281.2 p2  reference standard
9606    1832    DSP NG_008803.1 LRG_423 NM_004415.3     NP_004406.2     aligned: Selected
9606    1832    DSP NG_008803.1 LRG_423 NM_001008844.2      NP_001008844.1      aligned: Selected
9606    1832    DSP NG_008803.1 LRG_423 NM_001319034.1      NP_001305963.1      aligned: Selected
9606    1832    DSP NG_008803.1 LRG_423 NM_004415.2 t1  NP_004406.2 p1  reference standard

期望的结果

awk 'FNR==NR{a[$0];next} $(NF-1)$NF=="referencestandard" && $3 in a{print $3, ($5~/^NM_/?$5:$6)}' list file > update

1 个答案:

答案 0 :(得分:1)

不是很漂亮,但是

awk 'FNR==NR{a[$0];next} 
     $(NF-1)$NF=="referencestandard" && $3 in a && $7>b[$3] && $9>c[$3]{d[$3]=$3 FS $6; b[$3]=$7; c[$3]=$9}
     END{for(key in d){print d[key]}}' list file

输出:

TTR NM_000371.3
LAMA4 NM_002290.3
DSP NM_004415.2