我正在使用Perl来测试输入文件的每一行中的几个条件。下面的单行程适用于大多数记录,但不是全部。
在当前输出中,第2,3行和第5行是正确的,但第1行和第4行不正确,可能是因为STB
值中有两个以逗号分隔的值而不是一个。例如STB=0.5,0.645036;
而不是STB=0.590597;
。
我似乎无法弄清楚如何将相同的逻辑应用于这两个条件,即STB >= 0.8
,然后" STRAND BIAS" reads
是FDP
字段的值。
输入文件中会有一些行,其中一行STB
,还有一行有两行。
perl -ple '/^[^#].*FDP=(\d+);.*STB=(\d+\.\d+);/ and $_.=($2 >= 0.8?" STRAND BIAS ":" GOOD ").$1." reads"' input > out
chr1 93159358 . CT AC,C 51.3482 PASS AF=0,0.538462;AO=4,12;DP=39;FAO=0,21;FDP=39;FR=.;FRO=18;FSAF=0,11;FSAR=0,10;FSRF=15;FSRR=3;FWDB=0.0379899,0.0954749;FXX=0;HRUN=1,5;LEN=2,1;MLLD=22.441,10.1519;OALT=AC,-;OID=.,.;OMAPALT=AC,C;OPOS=93159358,93159359;OREF=CT,T;PB=0.5,0.5;PBP=1,1;QD=5.26648;RBI=0.0698716,0.219287;REFB=-0.0299799,-0.0774582;REVB=0.0586414,0.197411;RO=22;SAF=0,9;SAR=4,3;SRF=17;SRR=5;SSEN=0,0;SSEP=0,0;SSSB=-0.747246,-0.0336118;STB=0.5,0.645036;STBP=1,0.086;TYPE=mnp,del;VARB=0.059091,0.135819;ANN=EVI5 GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR 0/2:46:39:39:22:18:4,12:0,21:0,0.538462:4,3:0,9:17:5:0,10:0,11:15:3
chr1 93073228 . C A 142.937 PASS AF=0.4;AO=42;DP=105;FAO=42;FDP=105;FR=.;FRO=63;FSAF=25;FSAR=17;FSRF=28;FSRR=35;FWDB=-0.00213313;FXX=0.00943307;HRUN=2;LEN=1;MLLD=178.966;OALT=A;OID=.;OMAPALT=A;OPOS=93073228;OREF=C;PB=0.5;PBP=1;QD=5.44523;RBI=0.00753887;REFB=-0.0179184;REVB=-0.00723079;RO=63;SAF=25;SAR=17;SRF=28;SRR=35;SSEN=0;SSEP=0;SSSB=0.159972;STB=0.590597;STBP=0.144;TYPE=snp;VARB=0.0207923;ANN=EVI5 GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR 0/1:142:105:105:63:63:42:42:0.4:17:25:28:35:17:25:28:35
chr1 93089823 . T C 1038.33 PASS AF=1;AO=110;DP=111;FAO=111;FDP=111;FR=.;FRO=0;FSAF=76;FSAR=35;FSRF=0;FSRR=0;FWDB=0.0247073;FXX=0.00892777;HRUN=2;LEN=1;MLLD=59.5565;OALT=C;OID=.;OMAPALT=C;OPOS=93089823;OREF=T;PB=0.5;PBP=1;QD=37.4173;RBI=0.025038;REFB=-0.0649256;REVB=-0.0040564;RO=1;SAF=75;SAR=35;SRF=1;SRR=0;SSEN=0;SSEP=0;SSSB=-0.00628837;STB=0.5;STBP=1;TYPE=snp;VARB=0.000880627;ANN=EVI5 GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR 1/1:42:111:111:1:0:110:111:1:35:75:1:0:35:76:0:0
chr11 36596027 . AG AA,A 1031.71 PASS AF=0.121875,0.703125;AO=52,118;DP=333;FAO=39,225;FDP=320;FR=.;FRO=56;FSAF=2,136;FSAR=37,89;FSRF=14;FSRR=42;FWDB=0.0148693,0.00188064;FXX=0.0615818;HRUN=5,5;LEN=1,1;MLLD=11.6837,10.3394;OALT=A,-;OID=.,.;OMAPALT=AA,A;OPOS=36596028,36596028;OREF=G,G;PB=0.5,0.5;PBP=1,1;QD=12.8964;RBI=0.065829,0.11083;REFB=-0.0510698,-0.110624;REVB=0.0641277,0.110814;RO=85;SAF=2,84;SAR=50,34;SRF=17;SRR=68;SSEN=0,0;SSEP=0,0.328125;SSSB=-0.504054,0.394265;STB=0.789128,0.571642;STBP=0.007,0;TYPE=snp,del;VARB=0.0245642,0.134756;ANN=RAG1 GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR 0/2:42:333:320:85:56:52,118:39,225:0.121875,0.703125:50,34:2,84:17:68:37,89:2,136:14:42
chr11 95825383 . C T 143.023 PASS AF=0.47561;AO=28;DP=71;FAO=39;FDP=82;FR=.;FRO=43;FSAF=6;FSAR=33;FSRF=40;FSRR=3;FWDB=-0.0301041;FXX=0.0238067;HRUN=1;LEN=1;MLLD=189.321;OALT=T;OID=.;OMAPALT=T;OPOS=95825383;OREF=C;PB=0.5;PBP=1;QD=6.97675;RBI=0.153139;REFB=-0.0165525;REVB=0.150151;RO=43;SAF=6;SAR=22;SRF=40;SRR=3;SSEN=0;SSEP=0;SSSB=-0.666847;STB=0.875258;STBP=0;TYPE=snp;VARB=0.0275999;ANN=MAML2 GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR 0/1:143:71:82:43:43:28:39:0.47561:22:6:40:3:33:6:40:3
line 1 STB=0.5,0.645036
line 4 STB=0.789128,0.571642
chr1 93159358 . CT AC,C 51.3482 PASS AF=0,0.538462;AO=4,12;DP=39;FAO=0,21;FDP=39;FR=.;FRO=18;FSAF=0,11;FSAR=0,10;FSRF=15;FSRR=3;FWDB=0.0379899,0.0954749;FXX=0;HRUN=1,5;LEN=2,1;MLLD=22.441,10.1519;OALT=AC,-;OID=.,.;OMAPALT=AC,C;OPOS=93159358,93159359;OREF=CT,T;PB=0.5,0.5;PBP=1,1;QD=5.26648;RBI=0.0698716,0.219287;REFB=-0.0299799,-0.0774582;REVB=0.0586414,0.197411;RO=22;SAF=0,9;SAR=4,3;SRF=17;SRR=5;SSEN=0,0;SSEP=0,0;SSSB=-0.747246,-0.0336118;STB=0.5,0.645036;STBP=1,0.086;TYPE=mnp,del;VARB=0.059091,0.135819;ANN=EVI5 GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR 0/2:46:39:39:22:18:4,12:0,21:0,0.538462:4,3:0,9:17:5:0,10:0,11:15:3
chr1 93073228 . C A 142.937 PASS AF=0.4;AO=42;DP=105;FAO=42;FDP=105;FR=.;FRO=63;FSAF=25;FSAR=17;FSRF=28;FSRR=35;FWDB=-0.00213313;FXX=0.00943307;HRUN=2;LEN=1;MLLD=178.966;OALT=A;OID=.;OMAPALT=A;OPOS=93073228;OREF=C;PB=0.5;PBP=1;QD=5.44523;RBI=0.00753887;REFB=-0.0179184;REVB=-0.00723079;RO=63;SAF=25;SAR=17;SRF=28;SRR=35;SSEN=0;SSEP=0;SSSB=0.159972;STB=0.590597;STBP=0.144;TYPE=snp;VARB=0.0207923;ANN=EVI5 GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR 0/1:142:105:105:63:63:42:42:0.4:17:25:28:35:17:25:28:35 GOOD 105 reads
chr1 93089823 . T C 1038.33 PASS AF=1;AO=110;DP=111;FAO=111;FDP=111;FR=.;FRO=0;FSAF=76;FSAR=35;FSRF=0;FSRR=0;FWDB=0.0247073;FXX=0.00892777;HRUN=2;LEN=1;MLLD=59.5565;OALT=C;OID=.;OMAPALT=C;OPOS=93089823;OREF=T;PB=0.5;PBP=1;QD=37.4173;RBI=0.025038;REFB=-0.0649256;REVB=-0.0040564;RO=1;SAF=75;SAR=35;SRF=1;SRR=0;SSEN=0;SSEP=0;SSSB=-0.00628837;STB=0.5;STBP=1;TYPE=snp;VARB=0.000880627;ANN=EVI5 GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR 1/1:42:111:111:1:0:110:111:1:35:75:1:0:35:76:0:0 GOOD 111 reads
chr11 36596027 . AG AA,A 1031.71 PASS AF=0.121875,0.703125;AO=52,118;DP=333;FAO=39,225;FDP=320;FR=.;FRO=56;FSAF=2,136;FSAR=37,89;FSRF=14;FSRR=42;FWDB=0.0148693,0.00188064;FXX=0.0615818;HRUN=5,5;LEN=1,1;MLLD=11.6837,10.3394;OALT=A,-;OID=.,.;OMAPALT=AA,A;OPOS=36596028,36596028;OREF=G,G;PB=0.5,0.5;PBP=1,1;QD=12.8964;RBI=0.065829,0.11083;REFB=-0.0510698,-0.110624;REVB=0.0641277,0.110814;RO=85;SAF=2,84;SAR=50,34;SRF=17;SRR=68;SSEN=0,0;SSEP=0,0.328125;SSSB=-0.504054,0.394265;STB=0.789128,0.571642;STBP=0.007,0;TYPE=snp,del;VARB=0.0245642,0.134756;ANN=RAG1 GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR 0/2:42:333:320:85:56:52,118:39,225:0.121875,0.703125:50,34:2,84:17:68:37,89:2,136:14:42
chr11 95825383 . C T 143.023 PASS AF=0.47561;AO=28;DP=71;FAO=39;FDP=82;FR=.;FRO=43;FSAF=6;FSAR=33;FSRF=40;FSRR=3;FWDB=-0.0301041;FXX=0.0238067;HRUN=1;LEN=1;MLLD=189.321;OALT=T;OID=.;OMAPALT=T;OPOS=95825383;OREF=C;PB=0.5;PBP=1;QD=6.97675;RBI=0.153139;REFB=-0.0165525;REVB=0.150151;RO=43;SAF=6;SAR=22;SRF=40;SRR=3;SSEN=0;SSEP=0;SSSB=-0.666847;STB=0.875258;STBP=0;TYPE=snp;VARB=0.0275999;ANN=MAML2 GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR 0/1:143:71:82:43:43:28:39:0.47561:22:6:40:3:33:6:40:3 STRAND BIAS 82 reads
chr1 93159358 . CT AC,C 51.3482 PASS AF=0,0.538462;AO=4,12;DP=39;FAO=0,21;FDP=39;FR=.;FRO=18;FSAF=0,11;FSAR=0,10;FSRF=15;FSRR=3;FWDB=0.0379899,0.0954749;FXX=0;HRUN=1,5;LEN=2,1;MLLD=22.441,10.1519;OALT=AC,-;OID=.,.;OMAPALT=AC,C;OPOS=93159358,93159359;OREF=CT,T;PB=0.5,0.5;PBP=1,1;QD=5.26648;RBI=0.0698716,0.219287;REFB=-0.0299799,-0.0774582;REVB=0.0586414,0.197411;RO=22;SAF=0,9;SAR=4,3;SRF=17;SRR=5;SSEN=0,0;SSEP=0,0;SSSB=-0.747246,-0.0336118;STB=0.5,0.645036;STBP=1,0.086;TYPE=mnp,del;VARB=0.059091,0.135819;ANN=EVI5 GOOD 39 Reads GOOD readsGT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR 0/2:46:39:39:22:18:4,12:0,21:0,0.538462:4,3:0,9:17:5:0,10:0,11:15:3
chr1 93073228 . C A 142.937 PASS AF=0.4;AO=42;DP=105;FAO=42;FDP=105;FR=.;FRO=63;FSAF=25;FSAR=17;FSRF=28;FSRR=35;FWDB=-0.00213313;FXX=0.00943307;HRUN=2;LEN=1;MLLD=178.966;OALT=A;OID=.;OMAPALT=A;OPOS=93073228;OREF=C;PB=0.5;PBP=1;QD=5.44523;RBI=0.00753887;REFB=-0.0179184;REVB=-0.00723079;RO=63;SAF=25;SAR=17;SRF=28;SRR=35;SSEN=0;SSEP=0;SSSB=0.159972;STB=0.590597;STBP=0.144;TYPE=snp;VARB=0.0207923;ANN=EVI5 GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR 0/1:142:105:105:63:63:42:42:0.4:17:25:28:35:17:25:28:35 GOOD 105 reads
chr1 93089823 . T C 1038.33 PASS AF=1;AO=110;DP=111;FAO=111;FDP=111;FR=.;FRO=0;FSAF=76;FSAR=35;FSRF=0;FSRR=0;FWDB=0.0247073;FXX=0.00892777;HRUN=2;LEN=1;MLLD=59.5565;OALT=C;OID=.;OMAPALT=C;OPOS=93089823;OREF=T;PB=0.5;PBP=1;QD=37.4173;RBI=0.025038;REFB=-0.0649256;REVB=-0.0040564;RO=1;SAF=75;SAR=35;SRF=1;SRR=0;SSEN=0;SSEP=0;SSSB=-0.00628837;STB=0.5;STBP=1;TYPE=snp;VARB=0.000880627;ANN=EVI5 GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR 1/1:42:111:111:1:0:110:111:1:35:75:1:0:35:76:0:0 GOOD 111 reads
chr11 36596027 . AG AA,A 1031.71 PASS AF=0.121875,0.703125;AO=52,118;DP=333;FAO=39,225;FDP=320;FR=.;FRO=56;FSAF=2,136;FSAR=37,89;FSRF=14;FSRR=42;FWDB=0.0148693,0.00188064;FXX=0.0615818;HRUN=5,5;LEN=1,1;MLLD=11.6837,10.3394;OALT=A,-;OID=.,.;OMAPALT=AA,A;OPOS=36596028,36596028;OREF=G,G;PB=0.5,0.5;PBP=1,1;QD=12.8964;RBI=0.065829,0.11083;REFB=-0.0510698,-0.110624;REVB=0.0641277,0.110814;RO=85;SAF=2,84;SAR=50,34;SRF=17;SRR=68;SSEN=0,0;SSEP=0,0.328125;SSSB=-0.504054,0.394265;STB=0.789128,0.571642;STBP=0.007,0;TYPE=snp,del;VARB=0.0245642,0.134756;ANN=RAG1 GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR 0/2:42:333:320:85:56:52,118:39,225:0.121875,0.703125:50,34:2,84:17:68:37,89:2,136:14:42 GOOD 320 Reads GOOD
chr11 95825383 . C T 143.023 PASS AF=0.47561;AO=28;DP=71;FAO=39;FDP=82;FR=.;FRO=43;FSAF=6;FSAR=33;FSRF=40;FSRR=3;FWDB=-0.0301041;FXX=0.0238067;HRUN=1;LEN=1;MLLD=189.321;OALT=T;OID=.;OMAPALT=T;OPOS=95825383;OREF=C;PB=0.5;PBP=1;QD=6.97675;RBI=0.153139;REFB=-0.0165525;REVB=0.150151;RO=43;SAF=6;SAR=22;SRF=40;SRR=3;SSEN=0;SSEP=0;SSSB=-0.666847;STB=0.875258;STBP=0;TYPE=snp;VARB=0.0275999;ANN=MAML2 GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR 0/1:143:71:82:43:43:28:39:0.47561:22:6:40:3:33:6:40:3 STRAND BIAS 82 reads
答案 0 :(得分:1)
获得更多信息真的很棒
从表面上看,只要您乐意忽略STB
的第二个值,就可以在值之后删除分号;
的坚持。喜欢这个
perl -ple '/^[^#].*FDP=(\d+);.*STB=(\d+\.\d+)/ and $_.=($2 >= 0.8?" STRAND BIAS ":" GOOD ").$1." reads"'
但这是非常糟糕的Perl代码,正如您所发现的那样,很难调试
我更喜欢这样的东西:一个脚本,它将所有标记值提取到哈希中并直接从那里使用
我猜想你希望STB
值的所有对于GOOD结果都不到0.8,所以我使用了来自{all
的{{1}}函数3}}测试一下。我只是在逗号上分割STB
值并创建一个布尔$all_ok
状态变量,指示这是否为真。无论是一个值还是一千个
然后printf
从我们计算的组件构建输出字符串
我清空了$line
变量,以便我们可以看到每行附加的内容以进行调试。只需删除该语句或将其注释为实际运行
use strict;
use warnings 'all';
use List::Util 'all';
while ( <> ) {
next unless /\S/;
my @fields = split;
chomp( my $line = $_ );
my %values = map { split /=/ } split /;/, $fields[7];
my $all_ok = all { $_ < 0.8 } split /,/, $values{STB};
$line = ''; # for debugging
printf "%s %s %s reads\n", $line, $all_ok ? 'GOOD' : 'STRAND BIAS', $values{FDP};
}
GOOD 39 reads
GOOD 105 reads
GOOD 111 reads
GOOD 320 reads
STRAND BIAS 82 reads
现在您已经解释过您想要输出中的状态值列表,我可以编写更好的解决方案。您不再需要List::Util::all
,而是需要从STB
数据列表中创建一个布尔状态值数组
在运行真实数据
之前,不要忘记注释掉$line = ''
看起来像这样
use strict;
use warnings 'all';
use List::Util 'all';
while ( <> ) {
next unless /\S/;
my @fields = split;
chomp( my $line = $_ );
my %values = map { split /=/ } split /;/, $fields[7];
my @stb_ok = map { $_ < 0.8 } split /,/, $values{STB};
my @good = map { $_ ? 'GOOD' : 'STRAND BIAS' } @stb_ok;
$line = ''; # for debugging
printf "%s %s %s reads\n", $line, "@good", $values{FDP};
}
GOOD GOOD 39 reads
GOOD 105 reads
GOOD 111 reads
GOOD GOOD 320 reads
STRAND BIAS 82 reads