Question

我对Perl来说是一个令人难以置信的新手，而且从来就不是一个非凡的程序员。我有一些成功的BVA例程来控制微处理器功能，但从来没有嵌入任何东西，或者多层次。无论如何，我今天的问题是关于在试图找出如何从我创建的文本文件中删除重复文本行时无法解决的问题。

该文件中可能有几行相同的txt行，而不是按顺序放置，这是有问题的，因为我实际上是逐行比较文件与自身。因此，如果第一行和第三行相同，我会将第一行写入新文件，而不是第三行。但是当我比较第三行时，我会再次写它，因为第一行是“忘记了”＃34;按我目前的代码。我确定这是一种简单的方法，但是我在代码中解决问题的方法很简单。这是代码：

my $searchString = pseudo variable "ideally an iterative search through the source file";
my $file2 = "/tmp/cutdown.txt";
my $file3 = "/tmp/output.txt";
my $count = "0";

open (FILE, $file2) || die "Can't open cutdown.txt \n";
open (FILE2, ">$file3") || die "Can't open output.txt \n";
    while (<FILE>) {
        print "$_";
        print "$searchString\n";
        if (($_ =~ /$searchString/) and ($count == "0")) {
            ++ $count;
            print FILE2 $_;
            } else {
            print "This isn't working\n";
        }
    }
close (FILE);

close (FILE2);

请原谅文件句柄和标量不匹配的方式。这是一项正在进行的工作...... :)

Answer 1

检查唯一性的秘诀是将您在散列中看到的行存储起来，并且只打印散列中不存在的行。

稍微更新您的代码以使用更多现代实践（三个{arg open()，词法文件句柄）我们得到这个：

my $file2 = "/tmp/cutdown.txt";
my $file3 = "/tmp/output.txt";

open my $in_fh,  '<', $file2 or die "Can't open cutdown.txt: $!\n";
open my $out_fh, '>', $file3 or die "Can't open output.txt: $!\n";

my %seen;

while (<$in_fh>) {
  print $out_fh unless $seen{$_}++;
}

但我会把它写成Unix过滤器。从STDIN读取并写信至STDOUT。这样，您的程序更灵活。整个代码变成：

#!/usr/bin/perl

use strict;
use warnings;

my %seen;

while (<>) {
  print unless $seen{$_}++;
}

假设这是在名为my_filter的文件中，您可以将其称为：

$ ./my_filter < /tmp/cutdown.txt > /tmp/output.txt

更新：但这不会使用您的$searchString变量。我不清楚这是为了什么。

Answer 2

如果您的文件不是很大，您可以将从输入文件中取出的每一行存储为哈希变量中的键。然后，打印哈希键（已排序）。这样的事情：

my %lines = ();
my $order = 1;

open my $fhi, "<", $file2 or die "Cannot open file: $!";
while( my $line = <$fhi> ) {
   $lines {$line} = $order++;
}
close $fhi;

open my $fho, ">", $file3 or die "Cannot open file: $!";

#Sort the keys, only if needed
my @ordered_lines = sort { $lines{$a} <=> $lines{$b} } keys(%lines);
for my $key( @ordered_lines ) {
   print $fho $key;
}

close $fho;

Answer 3

你需要做两件事：

用于跟踪您所看到的所有行的哈希
读取输入文件的循环

这是一个简单的实现，使用输入文件名和输出文件名调用。

Computer Management

为了测试它，我也把它包含在DATA句柄中。

use strict;
use warnings;

open my $fh_in, '<', $ARGV[0] or die "Could not open file '$ARGV[0]': $!";
open my $fh_out, '<', $ARGV[1] or die "Could not open file '$ARGV[1]': $!";

my %seen;

while (my $line = <$fh_in>) {

    # check if we have already seen this line
    if (not $seen{$line}) {
        print $fh_out $line;
    }

    # remember this line
    $seen{$line}++;
}

这将打印

use strict;
use warnings;

my %seen;

while (my $line = <DATA>) {

    # check if we have already seen this line
    if (not $seen{$line}) {
        print $line;
    }

    # remember this line
    $seen{$line}++;
}

__DATA__
foo
bar
asdf
foo
foo
asdfg
hello world

请记住，内存消耗会随文件大小而增加。只要文本文件小于RAM，它就应该没问题。 Perl的哈希内存消耗增长速度快于线性，但您的数据结构非常扁平。

在Perl中递归搜索？

3 个答案: