消除包含新行的文本

时间:2014-03-10 19:42:11

标签: perl

使用perl并逐行读取文件,我需要删除两个特定单词之间包含的所有文本(比如说“dog”和“cat”),但我不知道怎么做两个单词之间的各种线条。 Iim尝试使用“s”修饰符,这意味着点(。)可以解释为新行,但它不起作用:

use warnings;
use strict;
my $filename = shift;
open F, $filename or die "Usa: $0 FILENAME\n";
while(<F>) {
s/dog.*?cat//s;
print;
}
close F;

4 个答案:

答案 0 :(得分:1)

您正逐行读取文件,然后替换。如果您想同时使用整个文本,请使用

将输入记录分隔符设置为undef
local $/;

然后,当您执行&lt; F&gt;时,您将获得整个文件内容,并且替换应该有效。

答案 1 :(得分:1)

while (<F>) {
  my $n = s/dog.*//s .. s/.*?cat//;
  $n ||= 0;
  print if $n <= 1 or $n =~ /E/;
}

答案 2 :(得分:0)

上面的答案是正确的。我自己刚刚处理过这个问题。你可以尝试:

use strict;
my $filename = shift;
open F, $filename or die "Usa: $0 FILENAME\n";
my $buffer;
{ 
   local $/;
   $buffer = <F>;
   $buffer =~ s/dog.*?cat//s;
}
print $buffer;

请注意,这可能会产生您不想要的副作用。考虑输入:

dog foo dog bar cat

你想要'foo'包含在未打印的内容中吗?默认情况下,正则表达式是贪婪的,将删除'foo'...这可能是你想要的,也可能不是。

CPAN模块Regexp::Common::balanced可以帮助您找出处理这类边缘情况的正确方法。

答案 3 :(得分:0)

通过本地化$ /来填充文件将是您最简单的解决方案。但是,如果您想逐行处理,那么您只需要跟踪$状态变量

use strict;
use warnings;
use autodie;

my $filename = shift;
#open my $fh, '<', $filename;

my $state = 0;

while(<DATA>) {
    if ($state == 0 && s/(.*?)dog//) {
        print $1;
        $state = 1;
    }

    if ($state == 1 && s/.*?cat//) {
        $state = 2;
# If you want to handle more than one dog/cat pair, use below code
#       $state = 0;
#       redo;
    }

    if ($state != 1) {
        print;
    }
}

#close $fh;

__DATA__
1 hello world
2 more lines
3 this cat is ignored
4 and yet more
5 this has <dog ... yep, it really does
6 stuff to delete
7 this has cat>, cuz cats rock
8 Filler line
9 more <dogs are ignored.
10 more cat>s
11 more filler
12 yet more filler
13 More <dogs and cat>s and stuff
14 more filler
15 more filler
16 more <dogs and cat>s and <dogs and cat>s, see.
17 ending stuff

输出

1 hello world
2 more lines
3 this cat is ignored
4 and yet more
5 this has <>, cuz cats rock
8 Filler line
9 more <dogs are ignored.
10 more cat>s
11 more filler
12 yet more filler
13 More <dogs and cat>s and stuff
14 more filler
15 more filler
16 more <dogs and cat>s and <dogs and cat>s, see.
17 ending stuff

如果您取消注释这两行,以便过滤掉多个狗/猫对,那么您将获得以下内容:

1 hello world
2 more lines
3 this cat is ignored
4 and yet more
5 this has <>, cuz cats rock
8 Filler line
9 more <>s
11 more filler
12 yet more filler
13 More <>s and stuff
14 more filler
15 more filler
16 more <>s and <>s, see.
17 ending stuff