修改Perl中的输入文件

时间:2014-04-24 14:53:44

标签: perl file

我编写了一个Perl程序,它将2个文本文件作为输入。

第一个文件包含具有此格式的序列和概率

good morning 0.5

第二个文件包含具有此格式概率的所有单词

good 0.5
morning 0.6

我的脚本计算每个序列的公式

log( prob(sequence) / (prob(word1) - prob(sequence)) * (prob(word2) - prob(sequence)) )

问题在于我遇到prob(sequence)prob(word1)prob(word2)相同的情况,因此我得到Illegal division by zero

在这些情况下,有没有办法通过添加小数来更改第二个文件中的值? (平滑)

#!/usr/bin/perl
use strict; ## PLE
use warnings;

my $inFile = "file1.txt";
my $outFile ="TEST.txt";
my %hashFR = getVocab("file2.txt");
my @result;

my $bloc = 50000;
my $cmp = 0;

open fileIn, "<$inFile" or die $!;
while (<fileIn>) {
    chomp;
    my $flag = 0;
    my $ligne = $_;
    my @words = getWords($ligne);
    if (my $prob = pop @words) {
        $prob  =~ s/\(//g;
        my $probWords = 1;

        foreach my $word (@words) {
            my $probWord;
            if (exists $hashFR{$word}) {
                $probWord = $hashFR{$word};
            }
            $probWords *= $probWord-$prob;
        }

        my $calc = $prob*log2($prob/($probWords));
        my $result10 = sprintf("%.10f", $calc);
        push @result, join(' ',@words) ." (".$result10.")\n";
    }
}

#if(scalar(@result) == $bloc)
{
    $cmp += $bloc;
    print "$cmp lignes traités\n";
    writeToResultFile($outFile,@result);
    @result = ();
}

sub getWords {
    my ($ligne) = $_;

    my @words = split(' ', $ligne);

    return @words;
}

sub getVocab {
    my ( $filename ) = @_;
    my %hash = ();

    open fileVocab, "<$filename" or die $!;
    while (<fileVocab>) {
        chomp;

        if (2 == (my($mot, $prob) = split( / / ))) {
            $hash{trim($mot)} = trim($prob);
        }
    }
    close fileVocab;
    return %hash;
}

sub writeToResultFile {
    my ($filename,@res) = @_;
    open(INFO, ">>$filename");
    foreach ( @res) {
        print INFO $_;
    }
    close INFO
}
sub log2 {
    my $n = shift;
    return (log($n)/log(10))/(log(2)/log(10));
}

sub trim($) {
    my $string = shift;
    $string =~ s/^\s+//;
    $string =~ s/\s+$//;
    return $string;
}

1 个答案:

答案 0 :(得分:2)

你可以使用这样的异常处理:

my $calc
eval {
 $calc = $prob*log2($prob/($probWords));
};
if ($@){
  $calc = 0;#or whatever suits you
}

或更简单:

my $calc = eval { $prob*log2($prob/($probWords)) } // 'NaN';