我正在尝试使用包含6个元素的数组来处理perl循环。我希望循环从数组中拉出两个元素,执行某些函数,然后循环返回并从数组中拉出接下来的两个元素,直到数组耗尽元素。问题是循环只拉出前两个元素然后停止。这里的一些帮助会被大大提升。
my open(infile, 'dnadata.txt');
my @data = < infile>;
chomp @data;
#print @data; #Debug
my $aminoacids = 'ARNDCQEGHILKMFPSTWYV';
my $aalen = length($aminoacids);
my $i=0;
my $j=0;
my @matrix =();
for(my $i=0; $i<2; $i++){
for( my $j=0; $j<$aalen; $j++){
$matrix[$i][$j] = 0;
}
}
该计划的指导原则指出,该计划应忽略计划中存在的差距。这意味着应该忽略与间隙匹配的DNA代码。因此,推送的代码需要具有与删除的间隙相关联的对齐。
我需要将数组的长度修改为2,因为我在循环的这一部分中比较了两个序列。
#$lemseqcomp = $lenarray / 2;
#print $lenseqcomp;
#I need to initialize these saclar values.
$junk1 = " ";
$junk2 = " ";
$seq1 = " ";
$seq2 = " ";
这是导致问题的循环。我相信第一个循环应该移回到数组并在每次循环时拉出下一个元素,但事实并非如此。
for($i=0; $i<$lenarray; $i++){
#This code should remove the the last value of the array once and
#then a second time. The sequences should be the same length at this point.
my $last1 =pop(@data1);
my $last2 =pop(@data1);
for($i=0; $i<length($last1); $i++){
my $letter1 = substr($last1, $i, 1);
my $letter2 = substr($last2, $i, 1);
if(($letter1 eq '-')|| ($letter2 eq '-')){
#I need to put the sequences I am getting rid of somewhere. Here is a good place as any.
$junk1 = $letter1 . $junk1;
$junk2 = $letter1 . $junk2;
}
else{
$seq1 = $letter1 . $seq1;
$seq2 = $letter2 . $seq2;
}
}
}
print "$seq1\n";
print "$seq2\n";
print "@data1\n";
我实际上是在尝试从头开始创建一个替换矩阵并返回数据。代码看起来很奇怪的原因是因为它实际上还没有完成而我卡住了。 如果有人好奇,这是测试序列。
YFRFR
YF-FR
FRFRFR
ARFRFR
YFYFR-F
YFRFRYF
答案 0 :(得分:7)
首先,如果您要使用序列数据,请使用BioPerl。生活将如此更容易。然而...
由于您知道您将输入文件中的行作为成对进行比较,因此将它们读入反映该数据结构的数据结构是有意义的。正如其他人所建议的那样,像@data[[line1, line2],[line3,line4])
这样的数组可以确保正确的线对始终在一起。
我不清楚你要做的是:
那么,第一对是代表你的数据,还是更像是第二对呢?
ATCG---AAActctgGGGGG--taGC
ATCGcccAAActctgGGGGGTTtaGC
ATCG---AAActctgGGGGG--taGCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
ATCGcccAAActctgGGGGGTTtaGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
答案 1 :(得分:6)
问题是你使用$i
作为循环的计数器变量,所以内循环修改了外循环下的计数器。尝试将内循环的计数器更改为$j
,或使用my
正确定位它们。
答案 2 :(得分:3)
不要将值存储为数组,存储为二维数组:
my @dataset = ([$val1, $val2], [$val3, $val4]);
或
my @dataset;
push (@dataset, [$val_n1, $val_n2]);
然后:
for my $value (@dataset) {
### Do stuff with $value->[0] and $value->[1]
}
答案 3 :(得分:2)
你的代码中有很多奇怪的东西:你正在初始化矩阵然后不使用它;将整个文件读入数组;扫描字符串C样式,但不使用不匹配的值执行任何操作;最后,只打印两个最后处理的值(在您的情况下,这是您的数组的两个第一个元素,因为您正在使用pop。)
这是一个猜测。
use strict;
my $aminoacids = 'ARNDCQEGHILKMFPSTWYV';
# Preparing a regular expression. This is kind of useful if processing large
# amounts of data. This will match anything that is not in the string above.
my $regex = qr([^$aminoacids]);
# Our work function.
sub do_something {
my ($a, $b) = @_;
$a =~ s/$regex//g; # removing unwanted characters
$b =~ s/$regex//g; # ditto
# Printing, saving, whatever...
print "Something: $a - $b\n";
return ($a, $b);
}
my $prev;
while (<>) {
chomp;
if ($prev) {
do_something($prev, $_);
$prev = undef;
} else {
$prev = $_;
}
}
print STDERR "Warning: trailing data: $prev\n"
if $prev;
答案 4 :(得分:1)
由于你是一个完整的Perl /编程新手,我将展示你的第一个代码块的重写,然后我将为你提供一些一般的建议和链接。
让我们看看你的第一个示例代码块。有很多东西都串在一起,很难遵循。我个人而言,我太笨了,不能一次记住一些东西,所以我把问题分成了一些我能理解的小块。这被称为“分块”。
一个简单的方法是使用write子例程。采取任何可能重复的特定操作或想法,或者使代码的当前部分变得冗长且难以理解,并将其包装成一个漂亮的整齐包装并将其取出。
如果您为代码添加空间以使其更易于阅读,这也会有所帮助。你的思想已经在努力解决代码汤,为什么要让事情变得更难?对事物进行分组,在名称,空行和缩进中使用_
都有帮助。还有一些惯例可以提供帮助,例如制作常量值(不能或不应该更改的值)所有大写字母。
use strict; # Using strict will help catch errors.
use warnings; # ditto for warnings.
use diagnostics; # diagnostics will help you understand the error messages
# Put constants at the top of your program.
# It makes them easy to find, and change as needed.
my $AMINO_ACIDS = 'ARNDCQEGHILKMFPSTWYV';
my $AMINO_COUNT = length($AMINO_ACIDS);
my $DATA_FILE = 'dnadata.txt';
# Here I am using subroutines to encapsulate complexity:
my @data = read_data_file( $DATA_FILE );
my @matrix = initialize_matrix( 2, $amino_count, 0 );
# now we are done with the first block of code and can do more stuff
...
# This section down here looks kind of big, but it is mostly comments.
# Remove the didactic comments and suddenly the code is much more compact.
# Here are the actual subs that I abstracted out above.
# It helps to document your subs:
# - what they do
# - what arguments they take
# - what they return
# Read a data file and returns an array of dna strings read from the file.
#
# Arguments
# data_file => path to the data file to read
sub read_data_file {
my $data_file = shift;
# Here I am using a 3 argument open, and a lexical filehandle.
open( my $infile, '<', $data_file )
or die "Unable to open dnadata.txt - $!\n";
# I've left slurping the whole file intact, even though it can be very inefficient.
# Other times it is just what the doctor ordered.
my @data = <$infile>;
chomp @data;
# I return the data array rather than a reference
# to keep things simple since you are just learning.
#
# In my code, I'd pass a reference.
return @data;
}
# Initialize a matrix (or 2-d array) with a specified value.
#
# Arguments
# $i => width of matrix
# $j => height of matrix
# $value => initial value
sub initialize_matrix {
my $i = shift;
my $j = shift;
my $value = shift;
# I use two powerful perlisms here: map and the range operator.
#
# map is a list contsruction function that is very very powerful.
# it calls the code in brackets for each member of the the list it operates against.
# Think of it as a for loop that keeps the result of each iteration,
# and then builds an array out of the results.
#
# The range operator `..` creates a list of intervening values. For example:
# (1..5) is the same as (1, 2, 3, 4, 5)
my @matrix = map {
[ ($value) x $i ]
} 1..$j;
# So here we make a list of numbers from 1 to $j.
# For each member of the list we
# create an anonymous array containing a list of $i copies of $value.
# Then we add the anonymous array to the matrix.
return @matrix;
}
现在代码重写完成了,这里有一些链接:
Here's a response I wrote titled "How to write a program"。它提供了一些关于如何从规范中编写软件项目的基本指南。它针对的是初学者。我希望你觉得这对你有帮助。如果不出意外,其中的链接应该很方便。
对于初学程序员,从Perl开始,没有比Learning Perl更好的书了。
我还建议前往Perlmonks进行Perl帮助和指导。它是一个活跃的Perl特定社区网站,拥有非常聪明,友好的人,很乐意为您提供帮助。有点像Stack Overflow,但更专注。
祝你好运!答案 5 :(得分:0)
您可以在while循环中使用splice一次从数组中读取两个元素,而不是使用C样式for循环:
while (my ($letter1, $letter2) = splice(@data, 0, 2))
{
# stuff...
}
我已经清理了下面的一些其他代码:
use strict;
use warnings;
open(my $infile, '<', 'dnadata.txt');
my @data = <$infile>;
close $infile;
chomp @data;
my $aminoacids = 'ARNDCQEGHILKMFPSTWYV';
my $aalen = length($aminoacids);
# initialize a 2 x 21 array for holding the amino acid data
my $matrix;
foreach my $i (0 .. 1)
{
foreach my $j (0 .. $aalen-1)
{
$matrix->[$i][$j] = 0;
}
}
# Process all letters in the DNA data
while (my ($letter1, $letter2) = splice(@data, 0, 2))
{
# do something... not sure what?
# you appear to want to look up the letters in a reference table, perhaps $aminoacids?
}