我在解析GFF文件时遇到问题。我使用下面的代码作为一个班轮。我正在获取基于第1列($ 1)过滤的输出,但是当我添加大于5000但小于150000的额外过滤器时,awk不会过滤掉我的文件。我误解了一些东西,我不太确定它是什么。
S03 GeneWise mRNA 7000 84000 40.00 - . ID=NA;Source=NA;Function="NA";
S07 GeneWise CDS 80450 96070 . - 0 Parent=NA;
S10 GeneWise mRNA 96000 105032 50.00 - . ID=NA;Source=NA;Function="NA";
S10 GeneWise CDS 43800 76000 . - 0 Parent=NA;
S10 GeneWise mRNA 175032 190540 41.11 + . ID=NA;Source=NA;Function="NA";
S11 GeneWise CDS 3700 15000 . + 0 Parent=NA;
S15 GeneWise mRNA 18055 25000 40.00 - . ID=S15;Source=NA;Function="NA";
输入
S10 GeneWise mRNA 96000 105032 50.00 - . ID=NA;Source=NA;Function="NA";
S10 GeneWise CDS 43800 76000 . - 0 Parent=NA;
S10 GeneWise mRNA 175032 190540 41.11 + . ID=NA;Source=NA;Function="NA";
我输出的错误
S10 GeneWise mRNA 96000 105032 50.00 - . ID=NA;Source=NA;Function="NA";
预期产出
$(document).ready(function() {
otable = $('#dt').dataTable({
"bSortCellsTop": true,
"order": [[ 7, "desc" ]]
});
})
答案 0 :(得分:2)
这是条件的正确形式。但是,它只有一个匹配记录:
$ awk '
$1 == "S10" && $4 >= 50000 && $4 <=150000 {
print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9
}' file
S10 GeneWise mRNA 96000 105032 50.00 - . ID=NA;Source=NA;Function="NA";
除非您想要$1 == "S10" || $4 $4 >= 50000 && $4 <=150000
的记录,即。使用逻辑OR)但这将带来一个额外的记录:
awk '
$1 == "S10" || $4 >= 50000 && $4 <=150000 {
print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9
}' file
S07 GeneWise CDS 80450 96070 . - 0 Parent=NA;
S10 GeneWise mRNA 96000 105032 50.00 - . ID=NA;Source=NA;Function="NA";
S10 GeneWise CDS 43800 76000 . - 0 Parent=NA;
S10 GeneWise mRNA 175032 190540 41.11 + . ID=NA;Source=NA;Function="NA";
第一种更好的形式:
$ awk '
BEGIN { OFS="\t" } # define OFS to \t
$1 == "S10" && $4 >= 50000 && $4 <=150000 {
$1=$1 # rebuild the record
print # output
}' file