我有一个哈希结构就像:
my %my_hash=(
'gee1' => {
'gene' => '20',
'mRNA' => '9',
'CDS' => '10',
'exon' => '10',
'product' => '10',
},
'gee2' => {
'gene' => 'aa',
'mRNA' => '9',
'CDS' => '1aa',
'exon' => '1aa',
'product' => 'ab',
},
'gee4' => {
'gene' => 'aa',
'rRNA' => '9',
'product' => 'ab',
'locus' => 'abc'
},
'gee11' => {
'gene' => 'aa',
'rRNA' => '9',
'product' => 'ab',
'locus' => 'abc'
});
当我尝试按上述顺序打印哈希时,使用下面的代码:
for my $id ( sort { my ($anum) = ($a =~ /\w(\d+)$/); my ($bnum) = ($b =~ /\w(\d+)$/); $anum <=> $bnum} keys %my_hash)
{
print "$id\n";
for my $id1 (keys %{$my_hash{$id}})
{
print "\t$id1\n";
}
}
输出是这样的:
gee1
product
exon
gene
mRNA
CDS
gee2
product
exon
CDS
mRNA
gene
gee4
product
locus
gene
rRNA
gee11
rRNA
gene
product
locus
你可以看到
product
exon
gene
mRNA
CDS
以上部分未订购。
他们是否可以对上述部分进行排序?
顺序如下: 的基因的mRNA,rRNA基因,CDS,外显子,产品,轨迹
答案 0 :(得分:2)
只需创建一个哈希,您可以通过该哈希对键进行排序:
my %keys_sort;
my $c = 0;
$keys_sort{$_} = $c++ for qw( gene mRNA rRNA CDS exon product locus );
并使用它对它们进行排序:
for my $id1 (sort { $keys_sort{$a} <=> $keys_sort{$b} }
keys %{ $my_hash{$id} }
) {
或者,直接使用列表,但grep
仅使用相关的键:
for my $id1 (grep exists $my_hash{$id}{$_},
qw( gene mRNA rRNA CDS exon product locus )
) {
答案 1 :(得分:0)
您只需要将要查看子键的顺序定义为:
sub by_int_suffix{
my ($anum) = ($a =~ /\w(\d+)$/);
my ($bnum) = ($b =~ /\w(\d+)$/);
$anum <=> $bnum;
}
my @sub_key_order = qw(gene mRNA rRNA CDS exon product locus);
my @key_order = sort by_int_suffix keys %my_hash;
for my $id ( @key_order ){
print "$id\n";
for my $id1 (@sub_key_order){
next unless exists $my_hash{$id}->{$id1};
print "\t$id1\n";
}
}
答案 2 :(得分:0)
我强烈建议你做一个这样长度的排序程序,你就不再试图嵌入一个匿名子。
Sort将引用子例程。此子例程可以采用任何通用$a
和$b
并根据您喜欢的任何标准返回-1,0或1.(例如,cmp
或<=>
)。
这里需要注意的重要事项 - 哈希是明确无序的。实际上,它们通常是在幕后故意随意排序的。所以你需要也对你的内心进行排序&#39;哈希值。
因此,考虑到这一点,我建议这样的事情:
use strict;
use warnings;
my %my_hash = (
'gee1' => {
'gene' => '20',
'mRNA' => '9',
'CDS' => '10',
'exon' => '10',
'product' => '10',
},
'gee2' => {
'gene' => 'aa',
'mRNA' => '9',
'CDS' => '1aa',
'exon' => '1aa',
'product' => 'ab',
},
'gee4' => {
'gene' => 'aa',
'rRNA' => '9',
'product' => 'ab',
'locus' => 'abc'
},
'gee11' => {
'gene' => 'aa',
'rRNA' => '9',
'product' => 'ab',
'locus' => 'abc'
}
);
sub sort_key_number {
my ($anum) = ( $a =~ /(\d+)$/ );
my ($bnum) = ( $b =~ /(\d+)$/ );
#print "$anum, $bnum\n";
return $anum <=> $bnum;
}
my @subkeys = qw ( gene mRNA rRNA CDS exon product locus );
foreach my $key ( sort {sort_key_number} keys %my_hash ) {
print "\n", $key, ":\n";
foreach my $subkey (@subkeys) {
print "\t", $subkey, " = ", $my_hash{$key}{$subkey} || '', "\n";
}
}
您已通过键号&#39;对外部哈希进行了排序。而你的内部哈希,你每次都按照特定的顺序进行处理(由@subkeys
定义)。
如果你想要一点技巧,你可以用切片做最后一个foreach
循环:
print join ( "\n\t", @{$my_hash{$key}}{@subkeys} );
但我决定不是为了清晰起见。
答案 3 :(得分:0)
仅显示二级哈希的键有点不寻常,但答案是使用一个单独的数组 - 这里我使用了@required
- 来定义键和您想要的顺序
其他人已经解释了非常类似的东西,所以我在这里使用grep
来做一些变化。我没有打印每个子哈希的键,而是选择打印存在相应键的@required
元素
出于同样的原因,我使用map
来创建需要在外部sort
操作中进行比较的数值对
use strict;
use warnings;
use 5.010;
my %my_hash=(
gee1 => { CDS => 10, exon => 10, gene => 20, mRNA => 9, product => 10 },
gee11 => { gene => "aa", locus => "abc", product => "ab", rRNA => 9 },
gee2 => { CDS => "1aa", exon => "1aa", gene => "aa", mRNA => 9, product => "ab" },
gee4 => { gene => "aa", locus => "abc", product => "ab", rRNA => 9 },
);
my @sorted_keys = sort {
my ($aa, $bb) = map /(\d+)/, $a, $b;
$aa <=> $bb;
} keys %my_hash;
my @required = qw/ gene mRNA rRNA CDS exon product locus /;
for my $key ( @sorted_keys ) {
print "$key\n";
for ( grep exists $my_hash{$key}{$_}, @required ) {
print " $_\n";
}
}
<强>输出强>
gee1
gene
mRNA
CDS
exon
product
gee2
gene
mRNA
CDS
exon
product
gee4
gene
rRNA
product
locus
gee11
gene
rRNA
product
locus
<强>更新强>
那些丢失的哈希值让我烦恼,我应该指出,你可以通过将最后print
语句更改为
print " $_ = $my_hash{$key}{$_}\n"
现在输出
gee1
gene = 20
mRNA = 9
CDS = 10
exon = 10
product = 10
gee2
gene = aa
mRNA = 9
CDS = 1aa
exon = 1aa
product = ab
gee4
gene = aa
rRNA = 9
product = ab
locus = abc
gee11
gene = aa
rRNA = 9
product = ab
locus = abc
答案 4 :(得分:0)
你一下子做得太多了。使用两个循环。外循环用于对%my_hash
个基因进行排序,以及内循环对基因项进行排序:
#! /usr/bin/env perl
#
use strict;
use warnings;
use feature qw(say);
my %my_hash=(
'gee1' => { 'gene' => '20', 'mRNA' => '9', 'CDS' => '10', 'exon' => '10', 'product' => '10', },
'gee2' => { 'gene' => 'aa', 'mRNA' => '9', 'CDS' => '1aa', 'exon' => '1aa', 'product' => 'ab', },
'gee4' => { 'gene' => 'aa', 'rRNA' => '9', 'product' => 'ab', 'locus' => 'abc' },
'gee11' => { 'gene' => 'aa', 'rRNA' => '9', 'product' => 'ab', 'locus' => 'abc' },
);
for my $gene ( sort {fc $a cmp fc $b} keys %my_hash ) {
say $gene;
my %types = %{$my_hash{$gene}};
for my $type ( sort {fc $a cmp fc $b} keys %types ) {
say " $type: " . $my_hash{$gene}->{$type};
}
}
虽然我可以对此数据集进行直接排序,但我注意到存在大写和小写键。使用纯sort
命令将在当前排序规则区域设置中排序,该排序可以在小写字母之前对大写字母进行排序。使用fc
折叠案例。
也就是说,我这样做了:
sort { fc $a cmp fc $b } @array;
而不仅仅是:
sort @array;
您还可以考虑使用可能更快的Schwartzian Transformation。如果不出意外,您可以使用Schwartzian转换按排序顺序保存数据,以防您不想立即打印出来。