Counting number of pattern matches in Perl

时间:2016-10-20 13:02:59

标签: regex perl multiple-matches

I am VERY new to perl, and to programming in general. I have been searching for the past couple of days on how to count the number of pattern matches; I have had a hard time understanding others solutions and applying them to the code I have already written.

Basically, I have a sequence and I need to find all the patterns that match [TC]C[CT]GGAAGC

I believe I have that part down. but I am stuck on counting the number of occurrences of each pattern match. Does anyone know how to edit the code I already have to do this? Any advice is welcomed. Thanks!

#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;

# open fasta file for reading 
unless( open( FASTA, "<", '/scratch/Drosophila/dmel-all-chromosome-    r6.02.fasta' )) {
    die "Can't open dmel-all-chromosome-r6.02.fasta for reading:", $!;
}

#split the fasta record
local $/ = ">";

#scan through fasta file 
while (<FASTA>) {
    chomp;
    if ( $_ =~ /^(.*?)$(.*)$/ms) {
            my $header = $1;
            my $seq = $2;
            $seq =~ s/\R//g; # \R removes line breaks 
                    while ( $seq  =~ /([TC]C[CT]GGAAGC)/g) {
                            print $1, "\n";
            }
    }
}

Update, I have added in

my @matches = $seq =~ /([TC]C[CT]GGAAGC)/g;
                            print scalar @matches; 

In the code below. However, it seems to be outputting 0 in front of each pattern match, instead of outputting the total sum of all pattern matches.

while (<FASTA>) {
    chomp;
    if ( $_ =~ /^(.*?)$(.*)$/ms) {
            my $header = $1;
            my $seq = $2;
            $seq =~ s/\R//g; # \R removes line breaks 
                    while ( $seq  =~ /([TC]C[CT]GGAAGC)/g) {
                            print $1, "\n";
                            my @matches = $seq =~ /([TC]C[CT]GGAAGC)/g;
                            print scalar @matches;
    }
    }
}

Edit: I need the output to list ever pattern match found. I also need it to find the total number of matches found. For example:

CCTGGAAGC

TCTGGAAGC

TCCGGAAGC

3 matches found

3 个答案:

答案 0 :(得分:3)

counting the number of occurrences of each pattern match

my @matches = $string =~ /pattern/g

@matches array will contain all the matched parts. You can then do below to get the count.

print scalar @matches

Or you could directly write

my $matches = () = $string =~ /pattern/

I would suggest you to use the former as you might need to check "what was matched" in future (perhaps for debugging?).

Example 1:

use strict;
use warnings;
my $string = 'John Doe John Done';
my $matches = () = $string =~ /John/g;
print $matches; #prints 2

Example 2:

use strict;
use warnings;
my $string = 'John Doe John Done';
my @matches = $string =~ /John/g;
print "@matches"; #prints John John
print scalar @matches; #prints 2

Edit:

while ( my @matches = $seq  =~ /([TC]C[CT]GGAAGC)/g) {
    print $1, "\n";
    print "Count of matches:". scalar @matches;
}

答案 1 :(得分:2)

在编写代码时,您必须自己计算匹配数:

local $/ = ">";
my $count = 0;

#scan through fasta file 
while (<FASTA>) {
    chomp;
    if ( $_ =~ /^(.*?)$(.*)$/ms) {
            my $header = $1;
            my $seq = $2;
            $seq =~ s/\R//g; # \R removes line breaks 
                    while ( $seq  =~ /([TC]C[CT]GGAAGC)/g) {
                            print $1, "\n";
                            $count = $count +1;
            }
    }
}
print "Fount $count matches\n";

应该做的。

HTH Georg

答案 2 :(得分:1)

my @count = ($seq  =~ /([TC]C[CT]GGAAGC)/g);
print scalar @count ;