这是工作的一部分。在这部分中,我正在尝试编写一个程序来创建哈希。键是文件中的入藏号,值是整行。但是,该计划给了我一个警告。代码是:
#!/usr/bin/perl
#psuedocode:
#open file1, store uniport accesion as key and the line as value
#open file2, store uniport accesion as key and the line as value which lines contain "IDA"
#compare keys in two hashes, find out matched keys
#print out lines from file2 that match
use strict;
use warnings;
use feature qw(say);
my $infile1 = "geneIDs3_MouseToUniProtAccessions.txt";
my $inFH1;
open ($inFH1, "<", $infile1) or die join (" ", "Can't open", $infile1, "for reading:", $!);
my @array1 = <$inFH1>;
close $inFH1;
shift @array1;
my %geneID1;
for ($a = 0; $a < scalar @array1; $a++){
chomp $array1[$a];
$array1[$a] =~ /.*?\t(.*?)\t.*/;
$geneID1{$1} = $array1[$a];
#say ("$1", '->', "$geneID1{$array1[$a]}"); #test if the hash has been successfully created, however it doesn't
#say $array1[$a]; #test if the program can recognize the elements, it does
}
文件geneIDs3_MouseToUniProtAccessions.txt
包含1,000行,因此警告很多。前两行是:
From To Species Gene Name
PNMA3 Q9H0A4 Homo sapiens paraneoplastic antigen MA3
警告喜欢这样:
Use of uninitialized value within %geneID1 in string at match_for_part_III_10.pl line 24.
Q9H0A4->
我找到了解决方案:改为使用while
循环。它不仅有效,而且更优雅。新代码是:
#!/usr/bin/perl
#psuedocode:
#open file1, store uniport accesion as key and the line as value
#open file2, store uniport accesion as key and the line as value which lines contain "IDA"
#compare keys in two hashes, find out matched keys
#print out lines from file2 that match
use strict;
use warnings;
use feature qw(say);
my $infile1 = "geneIDs3_MouseToUniProtAccessions.txt";
my $inFH1;
open ($inFH1, "<", $infile1) or die join (" ", "Can't open", $infile1, "for reading:", $!);
my %geneID1;
while (<$inFH1>){
$_ =~ /.*?\t(.*?)\t.*/;
$geneID1{$1} = $_;
say ("$1", '->', "$geneID1{$1}");
}
close $inFH1;
谢谢大家的帮助!
答案 0 :(得分:3)
#!/usr/bin/perl
use strict;
use warnings;
use feature qw( say );
<>; # Skip header.
my %geneID1;
while (<>) {
chomp;
my @fields = split /\t/;
my $id = $fields[1];
$geneID1{$id} = $_;
}
say "$_ => $geneID1{$_}" for sort keys %geneID1;
(传递geneIDs3_MouseToUniProtAccessions.txt
作为参数。)
答案 1 :(得分:2)
很难说出错误是什么,有标签(是标签吗?)和更改问题中的代码。
但是,代码中有许多可以改进的元素
use warnings;
use strict;
use feature 'say';
my $file = 'geneIDs3_MouseToUniProtAccessions.txt';
open my $fh, '<', $file or die "Can't open $file: $!";
my %geneID1;
my $header = <$fh>;
while (<$fh>) {
chomp;
$geneID1{ (split /\t/)[1] } = $_;
}
say "$_ => $geneID1{$_}" for sort keys %geneID1;
一张“外卡”是您的数据;如果您不确定TAB
个字符使用\s+
(也匹配标签),因为您只需要第二个字段。默认为split
,您可以执行(split)[1]
。
对原始代码的评论
只有在有特定原因的情况下才提前阅读文件
声明所有内容,即使某些特殊功能允许您不允许($a
)
尽可能在最小范围内声明并接近所需位置:open my $fh, ...
请勿使用$a
之类的特殊变量,除非它们的用途是什么!
几乎不需要C风格的for
循环。如果你需要迭代中的索引
foreach my $i (0 .. $#ary) { ... }
其中$#ary
是数组@ary
的最后一个元素的索引