Question

我已经尝试了所有内容，以便在与另一个文件进行比较时，为一个文件创建唯一存在的单词列表。我在代码中放了一些调试打印，以找出它的去向，并发现代码在比较循环中从不做任何事情。

我认为我是盲人或忽略了一些非常明显的事情 - 有人请指出错误并且喜欢嘲笑我的“可能是新手”的错误。

while (<IN>) { #read the file

    chomp;

    $_ = lc; #convert to lower case
    s/ -- / /g; #remove double hyphen dashes
    s/ - / /g; #remove single hyphen dashes
    s/ +/ /g; #replace multiple spaces with one space
    s/[~`@#$%^&*-+=<>.,:;?"!_()\[\]]//g; #remove punctuation

    @hwords = split;
#   foreach $w (@hwords) { print  "$w \n";}

}
while (<IN0>) { #read the file

    chomp;

    $_ = lc; #convert to lower case
    s/ -- / /g; #remove double hyphen dashes
    s/ - / /g; #remove single hyphen dashes
    s/ +/ /g; #replacxew multiple spaces with one space
    s/[~`@#$%^&*-+=<>.,:;?"!_()\[\]]//g; #remove punctuation

    @awords = split;
#    foreach $w (@awords) {print "$w\n";}

}

$count =0;

@unique = ();

print "got here!\n"; # YES - it gets here

foreach  $w (@hwords) { print  "$w \n";}

foreach  $h (@hwords) {

    $x=1;
    print "got there!\n"; # NOPE, doesn't get here
    foreach $a (@awords) {
    if ($h eq $a) {
        $x=0;
        print "equals\n";  # NEVER see this
    }
    }
    if ($x eq 1) {
    ++$count;
    @unique = @unique, $h;
    print "$count, $h\n";  # NEVER see this, either
    }
}

Answer 1

首先，循环的每次迭代都完全取代@hwords和@awords。因此，最后，@hwords和@awords都只包含每个相应文件最后一行的字词。

您只需要从第一个文件中提取单词。然后，在读取第二个文件时，将其单词与第一个文件中存储的单词进行比较。

因此，在第一个循环中，不是设置@hwords，而是将其设为查找哈希：

$hwords{$_} = 1 for split;

现在，在读取第一个文件后，它的所有单词都是%hwords哈希的键。

然后，在读取第二个文件时，在第二个循环中，查找查找散列中的每个单词：

print "Word not found: $_\n"
    for grep { !$hwords{$_} } split;

Answer 2

这是一个FAQ，解决方案可以在FAQ中找到。

perldoc -q intersect

感谢irc.freenode.net上#perl上的@Botje提醒我这件事。

Answer 3

请检查：

 use Array::Utils qw(:all);

 my @a = qw( a b c d );
 my @b = qw( c d e f );

 #get items from array First list that are not in array Second List
 my @notinsecond = array_minus( @b, @a );

 #get items from array Second list that are not in array First List
 my @notinfirst = array_minus( @a, @b );


 print join "\n",  @notinfirst;
 print join "\n",  @notinsecond;

比较两个单词列表并保存perl中不在第二个列表中的单词

3 个答案: