Question

这是我目前的脚本，可以尝试将file_all.txt中的字词与file2.txt中的字词进行比较。它应打印出file_all中不在file2中的任何字词。

我需要将这些格式设置为每行一个单词，但这不是更紧迫的问题。

我是Perl的新手...我更多地使用C和Python，但这有点棘手，我知道我的变量赋值已关闭。

 use strict;
 use warnings;

 my $file2 = "file_all.txt";   %I know my assignment here is wrong
 my $file1 = "file2.txt";

 open my $file2, '<', 'file2' or die "Couldn't open file2: $!";
 while ( my $line = <$file2> ) {
     ++$file2{$line};
     }

 open my $file1, '<', 'file1' or die "Couldn't open file1: $!";
 while ( my $line = <$file1> ) {
     print $line unless $file2{$line};
     }

编辑：OH，它应该忽略大小写...比较时，Pie与PIE相同。并删除撇号

这些是我得到的错误：

"my" variable $file2 masks earlier declaration in same scope at absent.pl line 9.
"my" variable $file1 masks earlier declaration in same scope at absent.pl line 14.
Global symbol "%file2" requires explicit package name at absent.pl line 11.
Global symbol "%file2" requires explicit package name at absent.pl line 16.
Execution of absent.pl aborted due to compilation errors.

Answer 1

你快到了。

% sigil表示哈希。你不能在哈希中存储文件名，你需要一个标量。

my $file2 = 'file_all.txt';
my $file1 = 'file2.txt';

您需要哈希来计算出现次数。

my %count;

要打开文件，请指定其名称 - 它存储在标量中，您还记得吗？

open my $FH, '<', $file2 or die "Can't open $file2: $!";

然后，逐行处理文件：

while (my $line = <$FH> ) {
    chomp;                # Remove newline if present.
    ++$count{lc $line};   # Store the lowercased string.
}

然后，打开第二个文件，逐行处理，再次使用lc获取小写字符串。

要删除撇号，请使用替换：

$line =~ s/'//g;  # Replace ' by nothing globally (i.e. everywhere).

Answer 2

您的错误消息：

"my" variable $file2 masks earlier declaration in same scope at absent.pl line 9.
"my" variable $file1 masks earlier declaration in same scope at absent.pl line 14.
Global symbol "%file2" requires explicit package name at absent.pl line 11.
Global symbol "%file2" requires explicit package name at absent.pl line 16.
Execution of absent.pl aborted due to compilation errors.

您正在为$file2分配文件名，然后您正在使用open my $file2 ...在第二种情况下使用我的$file2会掩盖第一种情况下的使用情况。然后，在while循环的主体中，你假装有一个哈希表%file2，但你还没有声明它。

您应该使用更多描述性变量名称以避免概念混淆。

例如：

 my @filenames = qw(file_all.txt file2.txt);

将变量与integer suffixes is a code smell一起使用。

然后，将子例程的常见任务考虑在内。在这种情况下，您需要的是：1）一个函数，它接受一个文件名并返回该文件中的单词表，以及2）一个函数，它接受一个文件名，一个查找表，并打印文件中的单词，但不要出现在查找表中。

#!/usr/bin/env perl

use strict;
use warnings;

use Carp qw( croak );

my @filenames = qw(file_all.txt file2.txt);

print "$_\n" for @{ words_notseen(
    $filenames[0],
    words_from_file($filenames[1])
)};

sub words_from_file {
    my $filename = shift;
    my %words;

    open my $fh, '<', $filename
        or croak "Cannot open '$filename': $!";

    while (my $line = <$fh>) {
        $words{ lc $_ } = 1 for split ' ', $line;
    }

    close $fh
        or croak "Failed to close '$filename': $!";

    return \%words;
}

sub words_notseen {
    my $filename = shift;
    my $lookup = shift;

    my %words;

    open my $fh, '<', $filename
        or croak "Cannot open '$filename': $!";

    while (my $line = <$fh>) {
        for my $word (split ' ', $line) {
            unless (exists $lookup->{$word}) {
                $words{ $word } = 1;
            }
        }
    }

    return [ keys %words ];
}

Answer 3

正如您在问题中提到的那样：它应打印file_all中不在file2

中的任何字词

以下小代码执行此操作：

#!/usr/bin/perl
use strict;
use warnings;

my ($file1, $file2) = qw(file_all.txt file2.txt);

open my $fh1, '<', $file1 or die "Can't open $file1: $!";
open my $fh2, '<', $file2 or die "Can't open $file2: $!";

while (<$fh1>)
{
    last if eof($fh2);
    my $compline = <$fh2>;
    chomp($_, $compline);
    if ($_ ne $compline)
    {
        print "$_\n";
    }
}

file_all.txt：

ab
cd
ee
ef
gh
df

FILE2.TXT：

zz
yy
ee
ef
pp
df

输出：

ab
cd
gh

Answer 4

问题在于以下两行：

 my %file2 = "file_all.txt";
 my %file1 = "file2.txt";

在这里，您将在Perl中将一个名为SCALAR的值分配给Hash（由% sigil表示）。哈希由箭头运算符（=＆gt;）分隔的键值对组成。 e.g。

my %hash = ( key => 'value' );

哈希期望偶数个参数，因为它们必须同时提供键和值。您目前只为每个Hash提供一个值，因此抛出此错误。

要为SCALAR分配值，请使用$ sigil：

 my $file2 = "file_all.txt";
 my $file1 = "file2.txt";

Perl：比较两个文件中的单词

4 个答案: