Question

我是一名研究生，在计算生物学方面相对较新。我最近开始使用Perl，它不是最容易学习的语言，至少不适合我。

我需要帮助以正确的方式运用我的想法/逻辑来找出解决问题的方法。

我有一个dna字符串，我想在特定的网站上拆分它，使用来自包含识别网站行的酶文件的信息来获取多个片段。获得片段后，我想在输出文件中输出dna片段列表。我想为酶文件中的每一行创建一个输出文件，我将从中提取信息，将其应用于dna字符串。

这就是我的意思：

假设情景：

Enzyme.File包含：

abc / at＆＃39;> //（abc是酶的名称。（atgtct）是识别网站。）

def / cgg＆＃39; ataaa // ........

假设dna字符串为：$ dna =＆＃34; accggtt atgtct aaacggataaagtctcggataaattt＆＃34; （识别网站用粗体表示）

第1行 当我从酶文件中提取第一行/酶（abc）中的信息并将其应用于此字符串时，输出应为：

accggttat gtctaaacggataaagtctcggataaattt

（在cgg＆＃39; ataaa之间划分）撇号代表切割点（注意：即使字符串中还有另一个 gtct ，它也不会拆分它，因为应该在它之前。）

第2行 $ dna = accggttatgtctaaa cggataaa gtct cggataaa ttt（信息适用于相同的dna字符串）

来自line / enzyme 2（def）的信息将分裂dna如下：

accggttatgtctaaacgg（在cgg＆＃39; ataaa之间分开） ataaagtctcgg ataaattt

我想将不同行的每个输出放在具有不同名称的单独文件中。（我可以负责指定姓名）

总而言之，这个例子将创建两个新文件，一个名称＆＃34; abc_whatever＆＃34;和＆＃34; def_whatever＆＃34;。重要提示：如果酶文件有8行不同的酶，我会得到8个带有不同dna片段的新输出文件。＆＃34;

这是我迄今为止所做的尝试：

#!/usr/bin/perl;

use warnings;
use strict;


open(ENZ,$ARGV[0]) || die; # ENZ(file handle for enzyme file)

my $dna = "accggttatgtctaaacggataaagtctcggataaattt";

while (<ENZ>) {
     if ( match pattern etc..) { # I took care of that and created captured groups of 
       $1 = holds "abc"          # the info I needed from the line e.g. I captured
       $2 = ..."at"              # (abc)/(at)'(gtct)//, so they are stored in $1,$2,$3
       $3 = ..."gtct"            # respectively

     }
     while (<$dna>){
          my @fragments_array = split(/$3/, $dna);
          open (OutFile, ">$dna"."_"."$1")
          print OutFile shift @fragments_array,"\n";
          foreach (@fragments_array) {
          print OutFile "$3$_\n";
          close OutFile;
          }
    } 

}
close ENZ;

第一我只能为Enzyme文件中的第一行创建一个输出。我想为所有行创建和输出文件。

SECOND 我没有正确切割dna。从我在网上看过的其他例子来看，看起来我将不得不使用以下功能在dna上正确应用酶信息。功能包括：

for循环，长度和substr（），

如果可以，请以最简单的形式展示您的作品（没有奢侈，令人印象深刻的代码lol :-)，因为我只是在学习这种语言）

提前致谢！

Answer 1

FIRST 我只能为Enzyme文件中的第1行创建输出。我想为所有行创建和输出文件。

这只是因为您将close OutFile;放入foreach (@fragments_array)循环，而不是将close放在循环体之后。

SECOND 我没有正确切割dna。

那是因为您忘记在$2中添加at，识别网站的头（例如atgtct split）模式以及输出。

如果我们只是在 head 和 tail 之间的任何地方插入拆分换行符，问题就更容易解决了：

#!/usr/bin/perl
use warnings;
use strict;
open(ENZ, $ARGV[0]) || die; # ENZ (file handle for enzyme file)
my $dna = "accggttatgtctaaacggataaagtctcggataaattt";
while (<ENZ>)
{
    if (m-(.*)/(.*)'(.*)//-)
    {
        my ($head, $tail) = ($2, $3);   # $2$3 is the recognition site; save it
        open(OutFile, ">${dna}_$1");
        (my $fragments = $dna) =~ s/$head$tail/$head\n$tail/g;  # insert NLs
        print OutFile $fragments, "\n";
        close OutFile;
    } 
}
close ENZ;

Answer 2

我改变了你的代码，希望它现在有用

    #!/usr/bin/perl

    use warnings;
    use strict;

    open(ENZ, $ARGV[0]);

    my $dna = "accggttatgtctaaacggataaagtctcggataaattt";
    my ($enzyme, $first, $second) = ("", "", "");


    for my $line (<ENZ>) {
        chomp($line);                               # remove \n at the end of string
        my @elements = split(/\/|'/, $line);        # split string into tokens (e.g. abc/at'gtct => array(abc, at, gtct))
        $elements[2] = substr($elements[2], 0, -2); # remove the last "//"
        my ($firstPart, $secondPart) = ($elements[1], $elements[2]);
        if ($dna =~ /(.*)$firstPart$secondPart(.*)/) {
            $first = $1 . $firstPart;
            $second = $2 . $secondPart;
            $enzyme = $elements[0];
            open(OUTPUT, ">$enzyme" . "_something");
            print OUTPUT "$first\n$second\n";
            close(OUTPUT);
         }
    }

close ENZ;

编辑：这是工作版本。如果你想在你的学习中使用Perl，我建议你学习如何使用正则表达式。它是Perl中最强大的工具。

创建多个输出文件并用酶切割dna - Perl

2 个答案: