使用多个参数解析 - Awk

时间:2018-05-22 01:59:25

标签: parsing awk gff

我在解析GFF文件时遇到问题。我使用下面的代码作为一个班轮。我正在获取基于第1列($ 1)过滤的输出,但是当我添加大于5000但小于150000的额外过滤器时,awk不会过滤掉我的文件。我误解了一些东西,我不太确定它是什么。

S03       GeneWise        mRNA    7000       84000     40.00   -       .       ID=NA;Source=NA;Function="NA";
S07       GeneWise        CDS     80450       96070     .       -       0       Parent=NA;
S10       GeneWise        mRNA    96000       105032     50.00   -       .       ID=NA;Source=NA;Function="NA";
S10       GeneWise        CDS     43800       76000     .       -       0       Parent=NA;
S10      GeneWise        mRNA    175032       190540     41.11   +       .       ID=NA;Source=NA;Function="NA";
S11       GeneWise        CDS     3700       15000     .       +       0       Parent=NA;
S15       GeneWise        mRNA    18055       25000     40.00   -       .       ID=S15;Source=NA;Function="NA";

输入

S10       GeneWise        mRNA    96000       105032     50.00   -       .       ID=NA;Source=NA;Function="NA";
S10       GeneWise        CDS     43800       76000     .       -       0       Parent=NA;
S10      GeneWise        mRNA    175032       190540     41.11   +       .       ID=NA;Source=NA;Function="NA";

我输出的错误

S10       GeneWise        mRNA    96000       105032     50.00   -       .       ID=NA;Source=NA;Function="NA";

预期产出

$(document).ready(function() {
    otable = $('#dt').dataTable({
        "bSortCellsTop": true,
        "order": [[ 7, "desc" ]]
    });
 })

1 个答案:

答案 0 :(得分:2)

这是条件的正确形式。但是,它只有一个匹配记录:

$ awk ' 
$1 == "S10" && $4 >= 50000 && $4 <=150000 { 
    print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9
}' file
S10     GeneWise        mRNA    96000   105032  50.00   -       .       ID=NA;Source=NA;Function="NA";

除非您想要$1 == "S10" || $4 $4 >= 50000 && $4 <=150000的记录,即。使用逻辑OR)但这将带来一个额外的记录:

awk ' 
$1 == "S10" || $4 >= 50000 && $4 <=150000 { 
    print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9
}' file
S07     GeneWise        CDS     80450   96070   .       -       0       Parent=NA;
S10     GeneWise        mRNA    96000   105032  50.00   -       .       ID=NA;Source=NA;Function="NA";
S10     GeneWise        CDS     43800   76000   .       -       0       Parent=NA;
S10     GeneWise        mRNA    175032  190540  41.11   +       .       ID=NA;Source=NA;Function="NA";

第一种更好的形式:

$ awk ' 
BEGIN { OFS="\t" }                           # define OFS to \t
$1 == "S10" && $4 >= 50000 && $4 <=150000 { 
    $1=$1                                    # rebuild the record
    print                                    # output
}' file