Question

我正在尝试编写一个Perl程序，它将遍历给定目录中的所有文件，并识别每个文件中特定字符串的出现次数。

它正在梳理DNA序列，根据我所拥有的序列的方向寻找正向链或反向补体的ATG频率。我知道所有的序列都包含至少一个ATG或CAT（ATG的反向补码）和更多的序列，但在我的输出文件中，它给我零或一个。有什么建议吗？

P.S。忽略我正在编辑以前编写的脚本的不必要的变量

这是我的代码

#!/usr/bin/perl

my @file=<*.fasta>;
for $file (@file) {


my $get_file = <../[ES]RR*/> or print "Could not find";
$check = substr($file, 0, 9);
$filename = substr ($get_file, 3, 20);




my $pattern_reverse = 'CCATTTTGTCCAA[AC]C';
my $pattern = 'G[GT]TTGGACAAAATGG';
my $forward_start = 'ATG' ;
my $reverse_start = 'CAT' ;

open(DATA,$file) or die ("Couldn't open file.");

my $contig_name;
my $not_found_mark;
my $position;
my $symbol = ">";
my $contig_string;
my $contig_length;

$contig_name = <DATA>;
$not_found_mark = 1;
$contig_string = "";

while ((my $line = <DATA>) && ($not_found_mark)) {

chop($line);

$position = index($line,$symbol);
if ($position < 0) {
        $contig_string .= $line;
        }
else {
        $not_found_mark = 0;
        }
}


print "$filename \n";
$contig_length = length $contig_string;
print "The contig is $contig_length characters. \n";



if ($contig_string =~ /($pattern)/ ) {
        print "Found forward pattern.\n";
        if ( $contig_string =~ /(ATG)/ ) {
            $ATG_count = 0;
            $ATG_count++;
            open ( Match, ">>", ATG_match ) or die "Could not open ATG_match";
            print Match ">$filename $check $ATG_count \n" 
                or die "Could not append.";
            print "$ATG_count \n";

        }
}

elsif ( $contig_string =~ /($pattern_reverse)/ ) {
        print "Found reverse pattern.\n";
        if ( $contig_string =~ /(CAT)/ ) {
            $ATG_count = 0;
            $ATG_count++;
            open ( Match, ">>", ATG_match ) or die "Could not open ATG_match";
            print Match ">$filename $check $ATG_count \n" 
                or die "Could not append.";
            print "$ATG_count \n";
    }
}

else  {
        print "$file \n";
        print "Did not find pattern. \n";
        open ( Nomatch, ">>", no_ATG_match ) or die "Could not open";
        print Nomatch ">$filename $check\n" or die "Could not append";      
        }
}
print ( "There are $ATG_count ATG's \n" );
close ( Match );
close ( Nomatch );
close( DATA );

Answer 1

有什么建议吗？

看起来你用这两行将你的计数设置为1。

$ATG_count = 0;
$ATG_count++;

鉴于您正在使用++，我猜这不是您需要做的事情

在脚本顶部附近声明my $ATG_count = 0;是初始化所需的内容，之后只需使用++递增。（当你正在进行时，有没有理由你没有开始use strict; use warnings？）

你说那个

我正在编辑以前编写的脚本

为什么呢？这似乎是一个简单的任务，再次启动并编写代码可以更容易地执行您想要的和您理解的代码，而不是尝试创建代码来执行其他操作以执行您想要的操作。

Answer 2

要计算字符串的出现次数，您可以搜索匹配项，将匹配项置于列表上下文中，然后将它们分配给字符串以获取计数：

$ foo =（）= $ string =〜/ regex /;

遍历目录中的文件并计算字符串的频率

2 个答案: