我的文件看起来像这样:
1 1 A12P P1234
1 0 A52Q P1234
1 1 M12P P8866
1 1 R50T P1222
1 1 A82L P8866
0 0 D83F P8866
....
在perl循环中,我解析每一行并保存每个列值:
while (<FILE>){
...
my $actual=$1; my $predicted=$2; $mutation=$3; $id=$4;
if ($actual==1) {
if ($predicted==1) { $hash{$id}{$truep}++; }
elsif ($predicted==0){ $hash{$id}{$falsen}++; }
} elsif ($actual==0) {
if ($predicted==0) { $hash{$id}{$truen}++; }
elsif ($predicted==1){ $hash{$id}{$falsep}++; }
}
}
for $id (keys %hash) {
$tp = $hash{$id}{$truep};
$tn = $hash{$id}{$truen};
$fp = $hash{$id}{$falsep};
$fn = $hash{$id}{$falsen};
...
}
我希望如此,对于P8866:
$hash{$id}{$fp} --> FP = 0
$hash{$id}{$tp} --> TP = 2
$hash{$id}{$fn} --> FN = 0
$hash{$id}{$tn} --> TN = 1
我希望如此,对于P1234:
$hash{$id}{$fp} --> FP = 0
$hash{$id}{$tp} --> TP = 1
$hash{$id}{$fn} --> FN = 1
$hash{$id}{$tn} --> TN = 0
但它没有给我预期的价值。我错误地定义了哈希?有没有更好的方法来解析文件?
答案 0 :(得分:2)
假设你真的希望tn等是常量而不是变量,你需要为每个id初始化所有4,以便在散列哈希值中得到0。
use warnings;
use strict;
use Data::Dumper;
$Data::Dumper::Sortkeys=1;
my %hash;
while (<DATA>){
chomp;
my ($actual, $predicted, $mutation, $id) = split;
for my $type (qw(tn tp fn fp)) {
$hash{$id}{$type} = 0 unless exists $hash{$id}{$type};
}
if (($actual == 1) and ($predicted == 1)) {
$hash{$id}{tp}++;
}
elsif (($actual == 1) and ($predicted == 0)) {
$hash{$id}{fn}++;
}
elsif (($actual == 0) and ($predicted == 0)) {
$hash{$id}{tn}++;
}
elsif (($actual == 0) and ($predicted == 1)) {
$hash{$id}{fp}++;
}
}
print Dumper(\%hash);
__DATA__
1 1 A12P P1234
1 0 A52Q P1234
1 1 M12P P8866
1 1 R50T P1222
1 1 A82L P8866
0 0 D83F P8866
打印:
$VAR1 = {
'P1222' => {
'fn' => 0,
'fp' => 0,
'tn' => 0,
'tp' => 1
},
'P1234' => {
'fn' => 1,
'fp' => 0,
'tn' => 0,
'tp' => 1
},
'P8866' => {
'fn' => 0,
'fp' => 0,
'tn' => 1,
'tp' => 2
}
};