Question

我在文件F1中有一个文本，每个句子都在行，另一个文件包含文本中每个单词的词性（POS），例如：

F1包含：

he lives in paris\n
he jokes

F2包含：

he pro\n
lives verb\n
in prep\n
paris adv_pl\n
he pro\n
jokes verb\n

我想解析F1的每个句子并提取每个单词的POS。我到达提取第一句的POS，但程序无法解析第二行。这是代码：

open( FILE,    $filename )       || die "Problème d'ouverture du ficher en entrée";
open( FILEOUT, ">$filenameout" ) || die "Problème d'ouverture";

open( F,  "/home/ahmed/Bureau/test/corpus.txt" ) || die " Pb pour ouvrir";
open( F2, "/home/ahmed/Bureau/test/corp.txt" )   || die " Pb pour ouvrir";
my $z;
my $y = 0;
my $l;
my $li;
my $pos;

while ( $ligne = <F> ) {

    while ( $li = <F2> ) {    # F2 POS
        chomp($li);
        # prem contain the first word of F2 in each line, 
        # deux contain the POS of this word
        ( $prem, $deux ) = ( $li =~ m/^\W*(\w+)\W+(\w+)/ );
        print "premier: $prem\n";

        chomp($ligne);
        @val = split( / /, $ligne );   # corpus de texte
        $l = @val;

        while ( $y < $l ) {  # $l length of sentence
            $z = $val[$y];
            print "z : $z\n";

            if ( $z eq $prem ) {
                print "true\n";
                $pos .= "POSw" . $y . "=" . $deux . " ";
                ++$y;
            } else {
                last;
            }
        }
    }
    print FILEOUT "$pos\n";
    $pos = "";
}

我在终端的结果：

premier: he
z : he
true

premier : lives
z : lives
true

premier : in
z : in
true

premier : paris
z : paris
true
premier : he
premier : jokes

第一句有4个单词，当它通过4时，我们必须到文本的下一行，我无法到达解决它。

Answer 1

您的脚本中存在一些问题。

您 必须始终 use strict; use warnings;以显示最常见的语法和/或输入错误，未使用的变量等。
您应始终使用三参数open而不使用全局FILEHANDLE（请参阅opentut）。
您应该为文件句柄使用一些合理的名称，而不是FH，FH1等，但$fh_sentences和$fh_grammar（或其他有意义的名称）。

到目前为止一般部分。现在让我们更具体一点：

您的外循环（F）逐个读取句子。下一个循环（F2）读取语法类型，但它只对第一个句子执行一次。读取F2文件后，对<F2>的后续调用将始终返回undef，因为该文件已被读取。您必须在每个句子之后将文件指针重置为文件的开头，或者 - 甚至更好 - 提前读取文件F2并将其内容存储在散列中。
使用foreach my $word (@words)可以更轻松地对句子中的一系列单词进行迭代。无需自己管理索引变量（如$y）。
chomp和split句子应该移到F2循环之外，因为$ligne在循环中没有变化，只会烧掉CPU周期。

把这些放在一起我最终得到了这个：

use strict;
use warnings;

# Read the grammar file, F2, into a hash:
my %grammar;
open( my $fh_grammar, '<', 'F2' ) or die "Pb pour ouvrir F2: $!\n";
while( my $ligne = <$fh_grammar> ) {
    my ($prem, $deux) = ( $ligne =~ m/^\W*(\w+)\W+(\w+)/ );
    $grammar{$prem} = $deux;
}
close($fh_grammar);

# The hash is now:
#   %grammar = (
#       'he'    => 'pro',
#       'lives  => 'verb',
#       'in'    => 'prep',
#       'paris' => 'adv_pl'
#       'jokes' => 'verb'
#   );

# Read the sentences from F1 and check the grammar:
open( my $fh_sentences, '<', 'F1' ) or die "Pb pour ouvrir F1: $!\n";
while( my $ligne = <$fh_sentences> ) {
    my @words = split(/\s+/, $ligne );
    foreach my $word (@words) {
        print "z:    $word\n";
        if ( exists $grammar{$word} ) {
            print "true; $grammar{$word}\n";
        }
    }
    print "\n";
}
close($fh_sentences);

输出：

z:    he
true; pro
z:    lives
true; verb
z:    in
true; prep
z:    paris
true; adv_pl

z:    he
true; pro
z:    jokes
true; verb

Answer 2

您可以通过以下方式解决上述问题：

首先阅读POS文件并将其放入哈希

代码：

var a, b: IInterface;
begin
  a := b;
end;

现在读取您的文件并将其替换为POS

mov eax,$0042481c
mov edx,[$00424820]
call @IntfCopy

我相信这是解决问题的更好方法。希望这能解决你的问题。

如果达到条件，则为下一行数组

2 个答案: