我正在尝试编写给出该字符串的代码:
“ TTGCATCCCTAAAGGGATCATCATCATCATCATCATCATCATCATCATCATCATCATCATCATCATCATCATCTTTGTGATCAA”
查找子字符串ATC的连续重复(别名tandem repeats),对它们进行计数,如果大于10,则输出消息“关”
这是我的代码:
my @count = ($content =~ /ATC+/g);
print @count . " Repeat length\n";
$nrRepeats = scalar(@count);
if ($nrRepeats>10) {
print("Off\n");
}
else {
print("On\n");
}
并发症:
它计算字符串中存在的所有ATC子字符串,而不是仅重复串联ATC。
非常感谢您的帮助!
答案 0 :(得分:4)
您的问题有点模棱两可。我将分别回答每种解释。
如果要确定字符串是否连续包含10个以上的ATC,可以使用
if ($content =~ /ATCATCATCATCATCATCATCATCATCATCATC/)
此正则表达式可以更紧凑地编写为
if ($content =~ /(?:ATC){11}/)
如果您要计算连续至少2个ATC的出现次数,可以使用
my $count = () = $content =~ /(?:ATC){2,}/g;
if ($count > 10)
(请参见perldoc -q count
。)
答案 1 :(得分:1)
您的正则表达式/ATC+/g
正在寻找AT
,然后是一个或多个C
,我怀疑您想要的是这个
/(ATC(?:ATC)+)/g
哪个是ATC,然后是一个或多个ATC
答案 2 :(得分:1)
Perl是一种可识别重复的编程语言,旨在克服重复的手工工作。因此,您可以编写将模式重复为$pattern x $repetitions
或直接键入'ATC'x11
的字符串。
除了通过/(?:ATC){11}/
(as already suggested)进行匹配之外,这是获得关闭的另一种方法:
print "Off\n" if $content =~ ("ATC" x 11);
要匹配ATC
和的所有串联重复序列,如果重复序列超过10个,则 [1] 必须循环循环:>
while ($content =~ /(ATC(?:ATC)+)/g) {
my $count = (length $1) / 3;
print "$count repeat length\n";
print "Off\n" if $count > 10;
}
否则,对于诸如$prefix.ATCx2.$infix.ATCx11.$postfix
之类的输入,检测将在第一个串联重复中停止。对captured match $1
的预定义引用用于检查匹配长度。
[1] 总共计算ATC
的出现,而忽略它们是否连续:
my $count = () = $content =~ /ATC/g;
print "count (total matches) $count\n";
答案 3 :(得分:0)
#!/usr/bin/env perl
use strict;
use warnings;
# The string with the text to match
my $content = "TTGCATCCCTAAAGGGATCATCATCATCATCATCATCATCATCATCATCATCATCATCATCATCATCTTTGTGATCAA";
# Split the text in every point preceded or followed by ATC
my @array = split /(?:(?<=ATC)|(?=ATC))/, $content;
# Creates an array which first element is 0 to contain every number of consecutives matches of ATC
my @count = 0;
for (@array) {
if (/^ATC$/) {
# If ATC matches $_ increment by one the number of matches
$count[-1]++;
} else {
# If not and the script is counting a previous ATC sequence
# we reset the counter adding a new element
$count[-1] != 0 and push @count, 0;
}
}
# Initialices $max and $index to 0 and undef respectively
my ($max,$index) = (0, undef);
for (keys @count) {
# If $max has less value than the current iterated sequence
# $max is updated to current value and so is $index
$max < $count[$_] and ($max, $index) = ($count[$_], $_);
}
# $index won't be defined if no value of ATC exists
defined $index and print "$max Repeat length\n";
# prints Off is the max match is greater or equal than 10
print(($max>=10?'Off':'On')."\n");
我认为这是一个好方法,因为它可以让您知道更多数据,例如重复次数。
编辑:已更新注释。