根据两个字段

时间:2016-05-28 12:48:14

标签: perl

我正在使用Perl来测试输入文件的每一行中的几个条件。下面的单行程适用于大多数记录,但不是全部。

当前输出中,第2,3行和第5行是正确的,但第1行和第4行不正确,可能是因为STB值中有两个以逗号分隔的值而不是一个。例如STB=0.5,0.645036;而不是STB=0.590597;

我似乎无法弄清楚如何将相同的逻辑应用于这两个条件,即STB >= 0.8,然后" STRAND BIAS" readsFDP字段的值。

输入文件中会有一些行,其中一行STB,还有一行有两行。

的Perl

perl -ple '/^[^#].*FDP=(\d+);.*STB=(\d+\.\d+);/ and $_.=($2 >= 0.8?" STRAND BIAS ":" GOOD ").$1." reads"' input > out

输入

chr1    93159358    .   CT  AC,C    51.3482 PASS    AF=0,0.538462;AO=4,12;DP=39;FAO=0,21;FDP=39;FR=.;FRO=18;FSAF=0,11;FSAR=0,10;FSRF=15;FSRR=3;FWDB=0.0379899,0.0954749;FXX=0;HRUN=1,5;LEN=2,1;MLLD=22.441,10.1519;OALT=AC,-;OID=.,.;OMAPALT=AC,C;OPOS=93159358,93159359;OREF=CT,T;PB=0.5,0.5;PBP=1,1;QD=5.26648;RBI=0.0698716,0.219287;REFB=-0.0299799,-0.0774582;REVB=0.0586414,0.197411;RO=22;SAF=0,9;SAR=4,3;SRF=17;SRR=5;SSEN=0,0;SSEP=0,0;SSSB=-0.747246,-0.0336118;STB=0.5,0.645036;STBP=1,0.086;TYPE=mnp,del;VARB=0.059091,0.135819;ANN=EVI5    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR   0/2:46:39:39:22:18:4,12:0,21:0,0.538462:4,3:0,9:17:5:0,10:0,11:15:3
chr1    93073228    .   C   A   142.937 PASS    AF=0.4;AO=42;DP=105;FAO=42;FDP=105;FR=.;FRO=63;FSAF=25;FSAR=17;FSRF=28;FSRR=35;FWDB=-0.00213313;FXX=0.00943307;HRUN=2;LEN=1;MLLD=178.966;OALT=A;OID=.;OMAPALT=A;OPOS=93073228;OREF=C;PB=0.5;PBP=1;QD=5.44523;RBI=0.00753887;REFB=-0.0179184;REVB=-0.00723079;RO=63;SAF=25;SAR=17;SRF=28;SRR=35;SSEN=0;SSEP=0;SSSB=0.159972;STB=0.590597;STBP=0.144;TYPE=snp;VARB=0.0207923;ANN=EVI5 GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR   0/1:142:105:105:63:63:42:42:0.4:17:25:28:35:17:25:28:35
chr1    93089823    .   T   C   1038.33 PASS    AF=1;AO=110;DP=111;FAO=111;FDP=111;FR=.;FRO=0;FSAF=76;FSAR=35;FSRF=0;FSRR=0;FWDB=0.0247073;FXX=0.00892777;HRUN=2;LEN=1;MLLD=59.5565;OALT=C;OID=.;OMAPALT=C;OPOS=93089823;OREF=T;PB=0.5;PBP=1;QD=37.4173;RBI=0.025038;REFB=-0.0649256;REVB=-0.0040564;RO=1;SAF=75;SAR=35;SRF=1;SRR=0;SSEN=0;SSEP=0;SSSB=-0.00628837;STB=0.5;STBP=1;TYPE=snp;VARB=0.000880627;ANN=EVI5    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR   1/1:42:111:111:1:0:110:111:1:35:75:1:0:35:76:0:0
chr11   36596027    .   AG  AA,A    1031.71 PASS    AF=0.121875,0.703125;AO=52,118;DP=333;FAO=39,225;FDP=320;FR=.;FRO=56;FSAF=2,136;FSAR=37,89;FSRF=14;FSRR=42;FWDB=0.0148693,0.00188064;FXX=0.0615818;HRUN=5,5;LEN=1,1;MLLD=11.6837,10.3394;OALT=A,-;OID=.,.;OMAPALT=AA,A;OPOS=36596028,36596028;OREF=G,G;PB=0.5,0.5;PBP=1,1;QD=12.8964;RBI=0.065829,0.11083;REFB=-0.0510698,-0.110624;REVB=0.0641277,0.110814;RO=85;SAF=2,84;SAR=50,34;SRF=17;SRR=68;SSEN=0,0;SSEP=0,0.328125;SSSB=-0.504054,0.394265;STB=0.789128,0.571642;STBP=0.007,0;TYPE=snp,del;VARB=0.0245642,0.134756;ANN=RAG1    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR   0/2:42:333:320:85:56:52,118:39,225:0.121875,0.703125:50,34:2,84:17:68:37,89:2,136:14:42
chr11   95825383    .   C   T   143.023 PASS    AF=0.47561;AO=28;DP=71;FAO=39;FDP=82;FR=.;FRO=43;FSAF=6;FSAR=33;FSRF=40;FSRR=3;FWDB=-0.0301041;FXX=0.0238067;HRUN=1;LEN=1;MLLD=189.321;OALT=T;OID=.;OMAPALT=T;OPOS=95825383;OREF=C;PB=0.5;PBP=1;QD=6.97675;RBI=0.153139;REFB=-0.0165525;REVB=0.150151;RO=43;SAF=6;SAR=22;SRF=40;SRR=3;SSEN=0;SSEP=0;SSSB=-0.666847;STB=0.875258;STBP=0;TYPE=snp;VARB=0.0275999;ANN=MAML2    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR   0/1:143:71:82:43:43:28:39:0.47561:22:6:40:3:33:6:40:3

当前输出(第2,3和5行是正确的)

line 1 STB=0.5,0.645036
line 4 STB=0.789128,0.571642
chr1    93159358    .   CT  AC,C    51.3482 PASS    AF=0,0.538462;AO=4,12;DP=39;FAO=0,21;FDP=39;FR=.;FRO=18;FSAF=0,11;FSAR=0,10;FSRF=15;FSRR=3;FWDB=0.0379899,0.0954749;FXX=0;HRUN=1,5;LEN=2,1;MLLD=22.441,10.1519;OALT=AC,-;OID=.,.;OMAPALT=AC,C;OPOS=93159358,93159359;OREF=CT,T;PB=0.5,0.5;PBP=1,1;QD=5.26648;RBI=0.0698716,0.219287;REFB=-0.0299799,-0.0774582;REVB=0.0586414,0.197411;RO=22;SAF=0,9;SAR=4,3;SRF=17;SRR=5;SSEN=0,0;SSEP=0,0;SSSB=-0.747246,-0.0336118;STB=0.5,0.645036;STBP=1,0.086;TYPE=mnp,del;VARB=0.059091,0.135819;ANN=EVI5    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR   0/2:46:39:39:22:18:4,12:0,21:0,0.538462:4,3:0,9:17:5:0,10:0,11:15:3
chr1    93073228    .   C   A   142.937 PASS    AF=0.4;AO=42;DP=105;FAO=42;FDP=105;FR=.;FRO=63;FSAF=25;FSAR=17;FSRF=28;FSRR=35;FWDB=-0.00213313;FXX=0.00943307;HRUN=2;LEN=1;MLLD=178.966;OALT=A;OID=.;OMAPALT=A;OPOS=93073228;OREF=C;PB=0.5;PBP=1;QD=5.44523;RBI=0.00753887;REFB=-0.0179184;REVB=-0.00723079;RO=63;SAF=25;SAR=17;SRF=28;SRR=35;SSEN=0;SSEP=0;SSSB=0.159972;STB=0.590597;STBP=0.144;TYPE=snp;VARB=0.0207923;ANN=EVI5 GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR   0/1:142:105:105:63:63:42:42:0.4:17:25:28:35:17:25:28:35 GOOD 105 reads
chr1    93089823    .   T   C   1038.33 PASS    AF=1;AO=110;DP=111;FAO=111;FDP=111;FR=.;FRO=0;FSAF=76;FSAR=35;FSRF=0;FSRR=0;FWDB=0.0247073;FXX=0.00892777;HRUN=2;LEN=1;MLLD=59.5565;OALT=C;OID=.;OMAPALT=C;OPOS=93089823;OREF=T;PB=0.5;PBP=1;QD=37.4173;RBI=0.025038;REFB=-0.0649256;REVB=-0.0040564;RO=1;SAF=75;SAR=35;SRF=1;SRR=0;SSEN=0;SSEP=0;SSSB=-0.00628837;STB=0.5;STBP=1;TYPE=snp;VARB=0.000880627;ANN=EVI5    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR   1/1:42:111:111:1:0:110:111:1:35:75:1:0:35:76:0:0 GOOD 111 reads
chr11   36596027    .   AG  AA,A    1031.71 PASS    AF=0.121875,0.703125;AO=52,118;DP=333;FAO=39,225;FDP=320;FR=.;FRO=56;FSAF=2,136;FSAR=37,89;FSRF=14;FSRR=42;FWDB=0.0148693,0.00188064;FXX=0.0615818;HRUN=5,5;LEN=1,1;MLLD=11.6837,10.3394;OALT=A,-;OID=.,.;OMAPALT=AA,A;OPOS=36596028,36596028;OREF=G,G;PB=0.5,0.5;PBP=1,1;QD=12.8964;RBI=0.065829,0.11083;REFB=-0.0510698,-0.110624;REVB=0.0641277,0.110814;RO=85;SAF=2,84;SAR=50,34;SRF=17;SRR=68;SSEN=0,0;SSEP=0,0.328125;SSSB=-0.504054,0.394265;STB=0.789128,0.571642;STBP=0.007,0;TYPE=snp,del;VARB=0.0245642,0.134756;ANN=RAG1    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR   0/2:42:333:320:85:56:52,118:39,225:0.121875,0.703125:50,34:2,84:17:68:37,89:2,136:14:42
chr11   95825383    .   C   T   143.023 PASS    AF=0.47561;AO=28;DP=71;FAO=39;FDP=82;FR=.;FRO=43;FSAF=6;FSAR=33;FSRF=40;FSRR=3;FWDB=-0.0301041;FXX=0.0238067;HRUN=1;LEN=1;MLLD=189.321;OALT=T;OID=.;OMAPALT=T;OPOS=95825383;OREF=C;PB=0.5;PBP=1;QD=6.97675;RBI=0.153139;REFB=-0.0165525;REVB=0.150151;RO=43;SAF=6;SAR=22;SRF=40;SRR=3;SSEN=0;SSEP=0;SSSB=-0.666847;STB=0.875258;STBP=0;TYPE=snp;VARB=0.0275999;ANN=MAML2    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR   0/1:143:71:82:43:43:28:39:0.47561:22:6:40:3:33:6:40:3 STRAND BIAS 82 reads

期望的输出

chr1    93159358    .   CT  AC,C    51.3482 PASS    AF=0,0.538462;AO=4,12;DP=39;FAO=0,21;FDP=39;FR=.;FRO=18;FSAF=0,11;FSAR=0,10;FSRF=15;FSRR=3;FWDB=0.0379899,0.0954749;FXX=0;HRUN=1,5;LEN=2,1;MLLD=22.441,10.1519;OALT=AC,-;OID=.,.;OMAPALT=AC,C;OPOS=93159358,93159359;OREF=CT,T;PB=0.5,0.5;PBP=1,1;QD=5.26648;RBI=0.0698716,0.219287;REFB=-0.0299799,-0.0774582;REVB=0.0586414,0.197411;RO=22;SAF=0,9;SAR=4,3;SRF=17;SRR=5;SSEN=0,0;SSEP=0,0;SSSB=-0.747246,-0.0336118;STB=0.5,0.645036;STBP=1,0.086;TYPE=mnp,del;VARB=0.059091,0.135819;ANN=EVI5    GOOD 39 Reads GOOD readsGT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR   0/2:46:39:39:22:18:4,12:0,21:0,0.538462:4,3:0,9:17:5:0,10:0,11:15:3
chr1    93073228    .   C   A   142.937 PASS    AF=0.4;AO=42;DP=105;FAO=42;FDP=105;FR=.;FRO=63;FSAF=25;FSAR=17;FSRF=28;FSRR=35;FWDB=-0.00213313;FXX=0.00943307;HRUN=2;LEN=1;MLLD=178.966;OALT=A;OID=.;OMAPALT=A;OPOS=93073228;OREF=C;PB=0.5;PBP=1;QD=5.44523;RBI=0.00753887;REFB=-0.0179184;REVB=-0.00723079;RO=63;SAF=25;SAR=17;SRF=28;SRR=35;SSEN=0;SSEP=0;SSSB=0.159972;STB=0.590597;STBP=0.144;TYPE=snp;VARB=0.0207923;ANN=EVI5 GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR   0/1:142:105:105:63:63:42:42:0.4:17:25:28:35:17:25:28:35 GOOD 105 reads
chr1    93089823    .   T   C   1038.33 PASS    AF=1;AO=110;DP=111;FAO=111;FDP=111;FR=.;FRO=0;FSAF=76;FSAR=35;FSRF=0;FSRR=0;FWDB=0.0247073;FXX=0.00892777;HRUN=2;LEN=1;MLLD=59.5565;OALT=C;OID=.;OMAPALT=C;OPOS=93089823;OREF=T;PB=0.5;PBP=1;QD=37.4173;RBI=0.025038;REFB=-0.0649256;REVB=-0.0040564;RO=1;SAF=75;SAR=35;SRF=1;SRR=0;SSEN=0;SSEP=0;SSSB=-0.00628837;STB=0.5;STBP=1;TYPE=snp;VARB=0.000880627;ANN=EVI5    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR   1/1:42:111:111:1:0:110:111:1:35:75:1:0:35:76:0:0 GOOD 111 reads
chr11   36596027    .   AG  AA,A    1031.71 PASS    AF=0.121875,0.703125;AO=52,118;DP=333;FAO=39,225;FDP=320;FR=.;FRO=56;FSAF=2,136;FSAR=37,89;FSRF=14;FSRR=42;FWDB=0.0148693,0.00188064;FXX=0.0615818;HRUN=5,5;LEN=1,1;MLLD=11.6837,10.3394;OALT=A,-;OID=.,.;OMAPALT=AA,A;OPOS=36596028,36596028;OREF=G,G;PB=0.5,0.5;PBP=1,1;QD=12.8964;RBI=0.065829,0.11083;REFB=-0.0510698,-0.110624;REVB=0.0641277,0.110814;RO=85;SAF=2,84;SAR=50,34;SRF=17;SRR=68;SSEN=0,0;SSEP=0,0.328125;SSSB=-0.504054,0.394265;STB=0.789128,0.571642;STBP=0.007,0;TYPE=snp,del;VARB=0.0245642,0.134756;ANN=RAG1    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR   0/2:42:333:320:85:56:52,118:39,225:0.121875,0.703125:50,34:2,84:17:68:37,89:2,136:14:42 GOOD 320 Reads GOOD
chr11   95825383    .   C   T   143.023 PASS    AF=0.47561;AO=28;DP=71;FAO=39;FDP=82;FR=.;FRO=43;FSAF=6;FSAR=33;FSRF=40;FSRR=3;FWDB=-0.0301041;FXX=0.0238067;HRUN=1;LEN=1;MLLD=189.321;OALT=T;OID=.;OMAPALT=T;OPOS=95825383;OREF=C;PB=0.5;PBP=1;QD=6.97675;RBI=0.153139;REFB=-0.0165525;REVB=0.150151;RO=43;SAF=6;SAR=22;SRF=40;SRR=3;SSEN=0;SSEP=0;SSSB=-0.666847;STB=0.875258;STBP=0;TYPE=snp;VARB=0.0275999;ANN=MAML2    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR   0/1:143:71:82:43:43:28:39:0.47561:22:6:40:3:33:6:40:3 STRAND BIAS 82 reads

1 个答案:

答案 0 :(得分:1)

获得更多信息真的很棒

从表面上看,只要您乐意忽略STB的第二个值,就可以在值之后删除分号;的坚持。喜欢这个

perl -ple '/^[^#].*FDP=(\d+);.*STB=(\d+\.\d+)/ and $_.=($2 >= 0.8?" STRAND BIAS ":" GOOD ").$1." reads"'

但这是非常糟糕的Perl代码,正如您所发现的那样,很难调试

我更喜欢这样的东西:一个脚本,它将所有标记值提取到哈希中并直接从那里使用

我猜想你希望STB值的所有对于GOOD结果都不到0.8,所以我使用了来自{all的{​​{1}}函数3}}测试一下。我只是在逗号上分割STB值并创建一个布尔$all_ok状态变量,指示这是否为真。无论是一个值还是一千个

,这都可以正常工作

然后printf从我们计算的组件构建输出字符串

我清空了$line变量,以便我们可以看到每行附加的内容以进行调试。只需删除该语句或将其注释为实际运行

use strict;
use warnings 'all';

use List::Util 'all';

while ( <> ) {
    next unless /\S/;
    my @fields = split;

    chomp( my $line = $_ );

    my %values = map { split /=/ } split /;/, $fields[7];
    my $all_ok = all { $_ < 0.8 } split /,/, $values{STB};

    $line = '';  # for debugging

    printf "%s %s %s reads\n", $line, $all_ok ? 'GOOD' : 'STRAND BIAS', $values{FDP};
}

输出

 GOOD 39 reads
 GOOD 105 reads
 GOOD 111 reads
 GOOD 320 reads
 STRAND BIAS 82 reads


更新

现在您已经解释过您想要输出中的状态值列表,我可以编写更好的解决方案。您不再需要List::Util::all,而是需要从STB数据列表中创建一个布尔状态值数组

在运行真实数据

之前,不要忘记注释掉$line = ''

看起来像这样

use strict;
use warnings 'all';

use List::Util 'all';

while ( <> ) {
    next unless /\S/;
    my @fields = split;

    chomp( my $line = $_ );

    my %values = map { split /=/ } split /;/, $fields[7];

    my @stb_ok = map { $_ < 0.8 } split /,/, $values{STB};
    my @good   = map { $_ ? 'GOOD' : 'STRAND BIAS' } @stb_ok;

    $line = '';  # for debugging

    printf "%s %s %s reads\n", $line, "@good", $values{FDP};
}

输出

 GOOD GOOD 39 reads
 GOOD 105 reads
 GOOD 111 reads
 GOOD GOOD 320 reads
 STRAND BIAS 82 reads