Question

我正在尝试使用包含6个元素的数组来处理perl循环。我希望循环从数组中拉出两个元素，执行某些函数，然后循环返回并从数组中拉出接下来的两个元素，直到数组耗尽元素。问题是循环只拉出前两个元素然后停止。这里的一些帮助会被大大提升。

my open(infile, 'dnadata.txt');
my @data = < infile>;
chomp @data;
#print @data; #Debug

my $aminoacids = 'ARNDCQEGHILKMFPSTWYV';
my $aalen = length($aminoacids);

my $i=0;
my $j=0;
my @matrix =();
for(my $i=0; $i<2; $i++){
    for( my $j=0; $j<$aalen; $j++){
    $matrix[$i][$j] = 0;

    }
}

该计划的指导原则指出，该计划应忽略计划中存在的差距。这意味着应该忽略与间隙匹配的DNA代码。因此，推送的代码需要具有与删除的间隙相关联的对齐。

我需要将数组的长度修改为2，因为我在循环的这一部分中比较了两个序列。

#$lemseqcomp = $lenarray / 2;
#print $lenseqcomp;
#I need to initialize these saclar values.
$junk1 = " ";
$junk2 = " ";
$seq1 = " ";
$seq2 = " ";

这是导致问题的循环。我相信第一个循环应该移回到数组并在每次循环时拉出下一个元素，但事实并非如此。

for($i=0; $i<$lenarray; $i++){

    #This code should remove the the last value of the array once and 
    #then a second time. The sequences should be the same length at this point. 
my $last1 =pop(@data1);
my $last2 =pop(@data1);
for($i=0; $i<length($last1); $i++){
my $letter1 = substr($last1, $i, 1);
my $letter2 = substr($last2, $i, 1);
    if(($letter1 eq '-')|| ($letter2 eq '-')){ 
    #I need to put the sequences I am getting rid of somewhere. Here is a good place as any. 
    $junk1 = $letter1 . $junk1;
    $junk2 = $letter1 . $junk2;
    }
    else{
    $seq1 = $letter1 . $seq1;
    $seq2 = $letter2 . $seq2;

    }   
}
}
print "$seq1\n";
print "$seq2\n";
print "@data1\n";

我实际上是在尝试从头开始创建一个替换矩阵并返回数据。代码看起来很奇怪的原因是因为它实际上还没有完成而我卡住了。如果有人好奇，这是测试序列。

YFRFR
YF-FR
FRFRFR
ARFRFR
YFYFR-F
YFRFRYF

Answer 1

首先，如果您要使用序列数据，请使用BioPerl。生活将如此更容易。然而...

由于您知道您将输入文件中的行作为成对进行比较，因此将它们读入反映该数据结构的数据结构是有意义的。正如其他人所建议的那样，像@data[[line1, line2],[line3,line4])这样的数组可以确保正确的线对始终在一起。

我不清楚你要做的是：

a）您是否已达成共识？ 2个序列所在的序列差距只有差距
b）显着是你的2个序列不同，你正在尝试排除不对齐的部分和然后产生共识？

那么，第一对是代表你的数据，还是更像是第二对呢？

ATCG---AAActctgGGGGG--taGC
ATCGcccAAActctgGGGGGTTtaGC

ATCG---AAActctgGGGGG--taGCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
ATCGcccAAActctgGGGGGTTtaGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

Answer 2

问题是你使用$i作为循环的计数器变量，所以内循环修改了外循环下的计数器。尝试将内循环的计数器更改为$j，或使用my正确定位它们。

Answer 3

不要将值存储为数组，存储为二维数组：

my @dataset = ([$val1, $val2], [$val3, $val4]);

或

my @dataset;
push (@dataset, [$val_n1, $val_n2]);

然后：

for my $value (@dataset) {
 ### Do stuff with $value->[0] and $value->[1]
}

Answer 4

你的代码中有很多奇怪的东西：你正在初始化矩阵然后不使用它;将整个文件读入数组;扫描字符串C样式，但不使用不匹配的值执行任何操作;最后，只打印两个最后处理的值（在您的情况下，这是您的数组的两个第一个元素，因为您正在使用pop。）

这是一个猜测。

use strict;
my $aminoacids = 'ARNDCQEGHILKMFPSTWYV';

# Preparing a regular expression. This is kind of useful if processing large
# amounts of data. This will match anything that is not in the string above.
my $regex = qr([^$aminoacids]);

# Our work function. 
sub do_something {
    my ($a, $b) = @_;
    $a =~ s/$regex//g; # removing unwanted characters
    $b =~ s/$regex//g; # ditto
    # Printing, saving, whatever...
    print "Something: $a - $b\n";

    return ($a, $b);
}

my $prev;
while (<>) {
    chomp;
    if ($prev) {
        do_something($prev, $_);
        $prev = undef;
    } else {
        $prev = $_;
    }
}

print STDERR "Warning: trailing data: $prev\n"
    if $prev;

Answer 5

由于你是一个完整的Perl /编程新手，我将展示你的第一个代码块的重写，然后我将为你提供一些一般的建议和链接。

让我们看看你的第一个示例代码块。有很多东西都串在一起，很难遵循。我个人而言，我太笨了，不能一次记住一些东西，所以我把问题分成了一些我能理解的小块。这被称为“分块”。

一个简单的方法是使用write子例程。采取任何可能重复的特定操作或想法，或者使代码的当前部分变得冗长且难以理解，并将其包装成一个漂亮的整齐包装并将其取出。

如果您为代码添加空间以使其更易于阅读，这也会有所帮助。你的思想已经在努力解决代码汤，为什么要让事情变得更难？对事物进行分组，在名称，空行和缩进中使用_都有帮助。还有一些惯例可以提供帮助，例如制作常量值（不能或不应该更改的值）所有大写字母。

use strict;      # Using strict will help catch errors.
use warnings;    # ditto for warnings.
use diagnostics; # diagnostics will help you understand the error messages

# Put constants at the top of your program.
# It makes them easy to find, and change as needed.

my $AMINO_ACIDS = 'ARNDCQEGHILKMFPSTWYV';
my $AMINO_COUNT = length($AMINO_ACIDS);

my $DATA_FILE = 'dnadata.txt';

# Here I am using subroutines to encapsulate complexity:

my @data = read_data_file( $DATA_FILE );
my @matrix = initialize_matrix( 2, $amino_count, 0 );

# now we are done with the first block of code and can do more stuff

...

# This section down here looks kind of big, but it is mostly comments.
# Remove the didactic comments and suddenly the code is much more compact.

# Here are the actual subs that I abstracted out above.  
# It helps to document your subs:
#  - what they do
#  - what arguments they take
#  - what they return

# Read a data file and returns an array of dna strings read from the file.
# 
# Arguments
#   data_file => path to the data file to read

sub read_data_file {
    my $data_file = shift;

    # Here I am using a 3 argument open, and a lexical filehandle.
    open( my $infile, '<', $data_file )
         or die "Unable to open dnadata.txt - $!\n";

    # I've left slurping the whole file intact, even though it can be very inefficient.
    # Other times it is just what the doctor ordered.
    my @data = <$infile>;
    chomp @data;

    # I return the data array rather than a reference
    # to keep things simple since you are just learning.
    #
    # In my code, I'd pass a reference.

    return @data;
}

# Initialize a matrix (or 2-d array) with a specified value.
# 
# Arguments
#    $i     => width of matrix
#    $j     => height of matrix
#    $value => initial value

sub initialize_matrix {
    my $i     = shift;
    my $j     = shift;
    my $value = shift;

    # I use two powerful perlisms here:  map and the range operator.
    #
    # map is a list contsruction function that is very very powerful.
    # it calls the code in brackets for each member of the the list it operates against.
    # Think of it as a for loop that keeps the result of each iteration, 
    # and then builds an array out of the results.
    #
    # The range operator `..` creates a list of intervening values. For example:
    #     (1..5) is the same as (1, 2, 3, 4, 5)

    my @matrix = map {
        [ ($value) x $i ]
    } 1..$j;

    # So here we make a list of numbers from 1 to $j.
    # For each member of the list we
    #     create an anonymous array containing a list of $i copies of $value.
    # Then we add the anonymous array to the matrix.

    return @matrix;
}

现在代码重写完成了，这里有一些链接：

Here's a response I wrote titled "How to write a program"。它提供了一些关于如何从规范中编写软件项目的基本指南。它针对的是初学者。我希望你觉得这对你有帮助。如果不出意外，其中的链接应该很方便。

对于初学程序员，从Perl开始，没有比Learning Perl更好的书了。

我还建议前往Perlmonks进行Perl帮助和指导。它是一个活跃的Perl特定社区网站，拥有非常聪明，友好的人，很乐意为您提供帮助。有点像Stack Overflow，但更专注。

祝你好运！

Answer 6

您可以在while循环中使用splice一次从数组中读取两个元素，而不是使用C样式for循环：

while (my ($letter1, $letter2) = splice(@data, 0, 2))
{
    # stuff...
}

我已经清理了下面的一些其他代码：

use strict;
use warnings;
open(my $infile, '<', 'dnadata.txt');
my @data = <$infile>;
close $infile;

chomp @data;

my $aminoacids = 'ARNDCQEGHILKMFPSTWYV';
my $aalen = length($aminoacids);

# initialize a 2 x 21 array for holding the amino acid data
my $matrix;
foreach my $i (0 .. 1)
{
    foreach my $j (0 .. $aalen-1)
    {
        $matrix->[$i][$j] = 0;
    }
}

# Process all letters in the DNA data
while (my ($letter1, $letter2) = splice(@data, 0, 2))
{
    # do something... not sure what?
    # you appear to want to look up the letters in a reference table, perhaps $aminoacids?
}

为什么我的Perl for loop早退出？

6 个答案: