我有一个包含不同行的文件,每行都有一个重复的模式。我需要一个propper数据结构来解析我的文件,例如:
cluster1:gene1(genome1) gene2(genome2) gene3(genome3)
cluster2:gene4(genome4) gene5(genome5)
名称是任意的。
我想过一个哈希数据结构的哈希值
%hoh=("cluster1" => {
"gene1"=>"genome1"
"gene2"=>"genome2"
"gene2"=>"genome2"
}, "cluster2" => {
"gene4"=>"genome4"
"gene5"=>"genome5"
}
)
我有两个问题: 第一:我如何在每一行中寻找重复的模式?
其次,我如何制作哈希哈希?
编辑:发布在Zaid的请求
#!/usr/bin/perl -w
use strict; use warnings;
my %HoH;
while(<DATA>){
my $line=$_;
chomp($line);
my ( $cluster, $genes ) = split (/:/,$line);
$HoH{ $cluster } = { split/[( )]+/ , $genes };
}
foreach $cluster (keys %HoH){
print "$cluster: ";
foreach $genes (keys %{$HoH{$cluster}}){
print "$genes = $HoH{$cluster}{$genes} ";
}
print "\n";
}
__DATA__
cluster1:gene1(genome1) gene2(genome2) gene3(genome3)
cluster2:gene4(genome4) gene5(genome5)
答案 0 :(得分:5)
OP发布尝试时的说明:
my %HoH;
while (<>) {
chomp;
my ( $cluster, $genes ) = split /:/;
$HoH{ $cluster } = { split /[( )]+/, $genes };
}
答案 1 :(得分:1)
假设模式始终遵循AAA:BBB(CCC) DDD(EEE) FFF(GGG)...
,您可以使用以下算法:
:
上拆分,将第一部分作为您的密钥阅读([^(]+)\(([^)])\)
$hoh{key from step 2}
=步骤4中的哈希未经测试但是类似下面的内容(哈希引用的东西有点不确定,但你明白了):
while(<>) {
($key, $rest) = split ':';
@genes = split ' ', $rest;
my %h;
foreach $gene (@genes) {
($k, $v) = split /[\(\)]/, $gene;
$h{$k} = $v;
}
$hoh{$key}=\%h;
}
虽然可能有更优雅的PERL-y方式:)
答案 2 :(得分:0)
#!/usr/bin/perl -w
use strict; use warnings;
my %HoH;
while(<DATA>){
my $line=$_;
chomp($line);
my ( $cluster, $genes ) = split (/:/,$line);
$HoH{ $cluster } = { split/[( )]+/ , $genes };
}
foreach my $cluster (keys %HoH){
print "$cluster: ";
foreach my $genes (keys %{$HoH{$cluster}}){
print "$genes = $HoH{$cluster}{$genes} ";
}
print "\n";
}
__ DATA __
cluster1:gene1(genome1)gene2(genome2)gene3(genome3)
cluster2:gene4(genome4)gene5(genome5)
答案 3 :(得分:0)
假设您不需要担心与预期输入不符的行,您可以使用单个split
。
while( <DATA> ){
chomp;
next unless $_; # skip blank lines
my($key,%value) = split /[:()\s]+/;
$data{$key} = \%value;
}