Question

我无法指定正确的算法。我正在使用循环迭代输入文件。我遇到的问题是最后一个循环。

#!/usr/bin/perl 
# Lab #4
# Judd Bittman

# http://www-users.cselabs.umn.edu/classes/Spring-2011/csci3003/index.php?page=labs 
# this site has what needs to be in the lab
# lab4 is the lab instructions
# yeast protein is the part that is being read

use warnings;
use strict;

my $file = "<YeastProteins.txt";
open(my $proteins, $file);
my @identifier;
my @qualifier;
my @molecularweight;
my @pi;
while (my $line1 = <$proteins>) {
    #print $line1;
    chomp($line1);
    my @line = split(/\t/, $line1);
    push(@identifier,      $line[0]);
    push(@qualifier,       $line[1]);
    push(@molecularweight, $line[2]);
    push(@pi,              $line[3]);
}
my $extreme  = 0;
my $ex_index = 0;
for (my $index = 1; $index < 6805; $index++) {
    if (   defined($identifier[$index])
        && defined($qualifier[$index])
        && defined($molecularweight[$index])
        && defined($pi[$index])) {
# print"$identifier[$index]\t:\t$qualifier[$index]:\t$molecularweight[$index]:\n$pi[$index]";
    }
    if (   defined($identifier[$index])
        && defined($qualifier[$index])
        && defined($pi[$index])) {
        if (abs($pi[$index] - 7) > $extreme && $qualifier[$index] eq "Verified")
        {
            $extreme  = abs($pi[$index] - 7);
            $ex_index = $identifier[$index];
            print $extreme. " " . $ex_index . "\n";
        }
    }
}
print $extreme;
print "\n";
print $ex_index;
print "\n";

# the part above does part b of the assignment
# YLR204W,its part of the Mitochondrial inner membrane protein as well as a processor.
my $exindex = 0;
my $high    = 0;

# two lines above and below is part c
# there is an error and I know there is something wrong
for (my $index = 1; $index < 6805; $index++) {
    if (   defined($qualifier[$index])
        && ($qualifier[$index]) eq "Verified"
        && defined($molecularweight[$index])
        && (abs($molecularweight[$index]) > $high)) {
        $high    = (abs($molecularweight[$index]) > $high);    # something wrong on this line, I know I wrote something wrong
        $exindex = $identifier[$index];
    }
}

print $high;
print "\n";
print $exindex;
print "\n";
close($proteins);
exit;

在最后一个循环中，我希望我的循环能够保持经过验证并具有最高分子量的蛋白质。这是在输入文件中。我可以使用什么代码告诉程序我想要保留最高编号及其名称？我觉得我非常接近，但我不能把手指放在上面。

Answer 1

首先，关于perl的注释 - 通常，使用foreach样式循环而不是c样式的索引循环更常见。例如：

for my $protein (@proteins) {
  #do something with $p
}

（你的情况可能需要它，我以为我会提到这个）

通过以下方式解决您的具体问题：

$high = (abs($molecularweight[$index])>$high);

$ high被设置为执行布尔测试的结果。删除＆gt; $ high部分（在if语句中进行测试），你可能会得到你想要的结果。

Answer 2

您可能想要更复杂的数据结构，例如嵌套哈希。如果没有更多的数据知识，很难给出一个可靠的例子，但是，你的第一个标识符是abc，第二个标识符是def，等等：

my %protein_entries = (
    abc => {
        qualifier        => 'something',
        molecular_weight => 1234,
        pi               => 'something',
    },
    def => {
        qualifier        => 'something else',
        molecular_weight => 5678,
        pi               => 'something else',
    },
    # …
);

然后，不是有几个不同的数组并跟踪哪个属于哪个，你得到的元素是这样的：

然后，如果你想通过分子量获得最高分，你可以按照分子量对标识符进行排序，然后将最高分辨率分开：

my $highest = (sort {
    $protein_entries{$a}{molecular_weight} 
    <=> 
    $protein_entries{$b}{molecular_weight}
} keys %protein_entries)[1];

您的算法存在问题，因为您基本上没有正确构建数据。

在此示例中，$highest将保留def，之后您可以返回并获取$protein_entries{def}{molecular_weight}或$protein_entries{def}引用的匿名哈希中的任何其他键，因此可以轻松地调用任何相关的相关数据。

Answer 3

只需改变：

$high    = (abs($molecularweight[$index]) > $high);

对此：

$high    = abs($molecularweight[$index]) if (abs($molecularweight[$index]) > $high);

在循环结束时，$ high将是$ molecularweight数组中的最高值。

有关perl循环的一点帮助

3 个答案: