Question

我有以下脚本（由同事编写），用于搜索特定子字符串的输入文本（DNA序列），并且基本上输出每次出现此子字符串时字母数的计数： / p>

#!/usr/bin/perl

#read file in from input line
$infile = $ARGV[0];
open(TXT, "<$infile");

#open output stream
$outfile = $ARGV[1];
open(OUT, ">$outfile");

#initialize a blank string for the DNA sequence
$DNA = &read_fasta();

$len = length($DNA);
print "\n DNA Length is: $len \n";

#restriction enzyme match pattern
$pattern = "AGCT";

$match = 0;

while($DNA =~ /$pattern/gi)
{
    $match++;
}

print "\n Total DNA matches to AGCT are: $match \n";

# split the DNA sequence into an array of fragments
@cutarr = split(/$pattern/i, $DNA);

#write the fragments out to a file
foreach $str(@cutarr)
{
    $len = length($str);
    print OUT "$len \n";
}


# Subfunction to read in a fasta file
sub read_fasta
{
    $sequence = "";

    while(<TXT>)
    {
        $line = $_;

        #remove newline characters
        chomp($line);

         # discard fasta header line
        if($line =~ /^>/){ next }

        # append the line to the DNA sequence
        else { $sequence .= $line }
    }
    return($sequence);
}

print "DNA is: \n $sequence \n";

我想知道是否有人可以帮我添加第二种搜索模式，以便脚本输出2次搜索中任意命中之间的字符数，即$ pattern1 = AGCT和$ pattern2 = GATC且输入序列是：

GGGGCC-AGCT-GAGAGACC-GATC-GAGAGAGAG-AGCT -

我只是为了显示搜索命中的位置。

输出将包括：

6
8
9

谢谢！

凯丽

Answer 1

您可以尝试以下脚本：

use v5.12;
use autodie;

open(my $in, "<", shift);

open(my $out, ">", shift);

my $DNA = read_fasta($in);

print "DNA is: \n $$DNA \n";
my $len = length($$DNA);
print "\n DNA Length is: $len \n";

my @pats=qw( AGCT GATC );

for (@pats) {
    my $m = () = $$DNA =~ /$_/gi;
    print "\n Total DNA matches to $_ are: $m \n";
}

my $pat=join("|",@pats);

my @cutarr = split(/$pat/, $$DNA);

#write the fragments out to a file
for (@cutarr) {
    my $len = length($_);
    print $out "$len \n";
}
close($out);
close($in);


# Subfunction to read in a fasta file
sub read_fasta {
    my ($in) = @_;
    my $sequence = "";

    while(<$in>) {
        my $line = $_;

        #remove newline characters
        chomp($line);

         # discard fasta header line
        if($line =~ /^>/){ next }

        # append the line to the DNA sequence
        else { $sequence .= $line }
    }
    return(\$sequence);
}

计算2个不同子串之间的字符数 - Perl

1 个答案: