我运行了一些awk
命令,我需要更改第一个,但似乎无法使语法正确。基本上,list
中的ID用于搜索$3
的{{1}}以及它们是否匹配以及file
字段Category
或($10)
= { {1}}然后将(NF-1)
或reference standard
和GeneID
或$2
输出到RNA
结果。如果只有一个$6
,那就有效,但在update
中有两个reference standard
,所以我需要使用LAMA4
或reference standards
reference standard
{ {1}}或$10
以及最高t#
或$7
。我似乎无法将其纳入下面的p#
。我为长篇文章道歉,试图包含所有细节。
列表
$9
文件
awk
AWK
TTR
LAMA4
DSP
当前结果
#tax_id GeneID Symbol RSG LRG RNA t Protein p Category
9606 7276 TTR NG_009490.1 LRG_416 NM_000371.3 t1 NP_000362.1 p1 reference standard
9606 3910 LAMA4 NG_008209.1 LRG_433 NM_002290.4 NP_002281.3 aligned: Selected
9606 3910 LAMA4 NG_008209.1 LRG_433 NM_001105206.2 NP_001098676.2 aligned: Selected
9606 3910 LAMA4 NG_008209.1 LRG_433 NM_001105207.2 NP_001098677.2 aligned: Selected
9606 3910 LAMA4 NG_008209.1 LRG_433 NM_001105208.2 NP_001098678.1 aligned: Selected
9606 3910 LAMA4 NG_008209.1 LRG_433 NM_001105209.2 NP_001098679.1 aligned: Selected
9606 3910 LAMA4 NG_008209.1 LRG_433 NM_001105208.1 t1 NP_001098678.1 p1 reference standard
9606 3910 LAMA4 NG_008209.1 LRG_433 NM_002290.3 t2 NP_002281.2 p2 reference standard
9606 1832 DSP NG_008803.1 LRG_423 NM_004415.3 NP_004406.2 aligned: Selected
9606 1832 DSP NG_008803.1 LRG_423 NM_001008844.2 NP_001008844.1 aligned: Selected
9606 1832 DSP NG_008803.1 LRG_423 NM_001319034.1 NP_001305963.1 aligned: Selected
9606 1832 DSP NG_008803.1 LRG_423 NM_004415.2 t1 NP_004406.2 p1 reference standard
期望的结果
awk 'FNR==NR{a[$0];next} $(NF-1)$NF=="referencestandard" && $3 in a{print $3, ($5~/^NM_/?$5:$6)}' list file > update
答案 0 :(得分:1)
不是很漂亮,但是
awk 'FNR==NR{a[$0];next}
$(NF-1)$NF=="referencestandard" && $3 in a && $7>b[$3] && $9>c[$3]{d[$3]=$3 FS $6; b[$3]=$7; c[$3]=$9}
END{for(key in d){print d[key]}}' list file
输出:
TTR NM_000371.3
LAMA4 NM_002290.3
DSP NM_004415.2