给定perl脚本将输入序列剪切为“E”并跳过@nobreak中提到的“E”的特定位置,并生成片段数组作为输出。但是我想要一个脚本,在输出中为每个已经跳过的位置生成这样的数组,并考虑@nobreak的所有位置。假设第1组包含在“E”37跳过后产生的片段,在“E”45跳过后设置2,依此类推。我写的下面提到的脚本无法正常工作。我希望在输出中生成4个不同的数组,一次取一个@nobreak的位置。请帮忙!
my $s = 'MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN';
print "Results of 1-Missed Cleavage:\n\n";
my @nobreak = (37, 45, 57, 59);
{
@nobreak = map { $_ - 1 } @nobreak;
foreach (@nobreak) {
substr($s, $_, 1) = "\0";
}
my @a = split /E(?!P)/, $s;
$_ =~ s/\0/E/g foreach (@a);
$result = join "E,", @a;
@final = split /,/, $result;
print "@final\n";
}
答案 0 :(得分:2)
要在每个'E'处拆分字符串而不在此过程中使用它,请使用lookbehind:
my @final = split /(?<=E)/, $str;
为了更好地控制要拆分的'E'(未指定),将对正则表达式进行更改。
如果需要变量lookbehind,可以使用\K
...
答案 1 :(得分:1)
循环@nobreak?
my $s = 'MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN';
print "Results of 1-Missed Cleavage:\n\n";
my @nobreak = (37,45,57,59);
for my $nobreak (@nobreak) {
substr($s, $nobreak-1, 1) = "\0";
my @a = split(/E(?!P)/, $s);
substr($s, $nobreak-1, 1) = 'E';
$_ =~ s/\0/E/g foreach (@a);
$result = join ("E,", @a);
@final = split(/,/, $result);
print "@final\n";
}
答案 2 :(得分:0)
您希望在所有E
个字符后拆分字符串,但不在任何P
个字符之前
此代码将执行您想要的操作。它的工作原理是将E
中每个偏移处的@nobreak
更改为e
(比"\0"
更好地进行调试)并在/(?<=E)(?!P)/
上拆分 - 即在E
,但不是P
之前。之后使用e
E
更改回tr/e/E/
use strict;
use warnings;
my $s = 'MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN';
print "Results of 1-Missed Cleavage:\n\n";
my @nobreak = (37, 45, 57, 59);
for my $index (@nobreak) {
my $ss = $s;
substr($ss, $index-1, 1) = 'e';
my @final = split /(?<=E)(?!P)/, $ss;
tr/e/E/ for @final;
print "$_\n" for @final;
print "\n";
}
<强>输出强>
Results of 1-Missed Cleavage:
MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGE
RGFFYTPKTRRE
AE
DLQVGQVE
LGGGPGAGSLQPLALE
GSLQKRGIVE
QCCTSICSLYQLE
NYCN
MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVE
ALYLVCGERGFFYTPKTRRE
AE
DLQVGQVE
LGGGPGAGSLQPLALE
GSLQKRGIVE
QCCTSICSLYQLE
NYCN
MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVE
ALYLVCGE
RGFFYTPKTRREAE
DLQVGQVE
LGGGPGAGSLQPLALE
GSLQKRGIVE
QCCTSICSLYQLE
NYCN
MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVE
ALYLVCGE
RGFFYTPKTRRE
AEDLQVGQVE
LGGGPGAGSLQPLALE
GSLQKRGIVE
QCCTSICSLYQLE
NYCN