Question

我有两个文件，我想逐行阅读（第一行包含每行一个单词，第二行每行包含一个句子）。

目标是计算file 2中包含file 1中单词的句子数。

这是我的代码：

open( my $words, '<:utf8', 'test' ) or die "Unable to open for read: $!"; `#test file is the file that contain my words`
open( my $sentences, '<:utf8', 'sentences' ) or die "Unable to open for read: $!"; `#sentences fila that contain one sentence per line`
open my $fh_resultat, ">:utf8", 'result';
my $word;
#i want to calculate the number of sentences from my $sentences that containe word from my file $words
while( defined( $word = <$words> ) ) {
    chomp $word ;
    $word =~ s/^\s*|\s*$//g;
    my $nb = 0;
    my $idf;
    my $ph;
    while (defined ( $ph = <$sentences> ) ){
        my @tab = split(/ /, $ph);
        chomp @tab ;
        foreach my $val(@tab) {
            if($word eq $val){
                $nb = $nb + 1;
                last;
            }
        }
    }
    print $fh_resultat "$word:$nb\n";
}

但只对第一个文件的第一个单词执行处理！

Answer 1

当您将文件句柄读到文件末尾时，该文件句柄的下一次读取将返回undef。无论你多少次调用它，它都将继续返回undef。

如果不使用seek()函数将文件指针重置为文件的开头，则无法遍历短语文件。

seek $CorpusPhrases, 0, 0;

或者，您可以考虑将一个（或两个）文件读入内存，这样就不需要继续阅读文件了。

Answer 2

查看您的代码;处理将仅针对第一个执行该文件的单词，因为你迭代整个＆＃34;句子＆＃34;文件中第一行从＆＃34; word＆＃34;文件。

已提到的两个解决方案;使用搜索和加载到内存中。

我主张将文件加载到内存中并进行相应的处理。

this.CreateBinding(LabelName).For(s => s.TextColor).To((MyViewModel vm) => vm.BooleanPropertyThatChangesAtSomePoint).WithConversion("BoolToColor").Apply();

逐行读取文件

2 个答案: