Question

我在perl oneliner上失败了一个小时，因为该文件有CRLF行结尾。它在行尾有一个带有组匹配的正则表达式，并且CR包含在匹配中，使用反向引用进行替换是坏事。

我最终在正则表达式中手动指定了CRLF，但有没有办法让perl句柄自动换行它们是什么？

原始命令是

perl -pe  's/foo bar(.*)$/foo $1 bar/g' file.txt

“正确”命令是

perl -pe  's/foo bar(.*)\r\n/foo $1 bar\r\n/g' file.txt

我知道我也可以在处理之前转换行结尾，我对如何让Perl优雅地处理这个案例感兴趣。

示例文件（使用CRLF行结尾保存！）

[19:06:57.033] foo barmy
[19:06:57.033] foo baryour

预期输出

[19:06:57.033] foo my bar
[19:06:57.033] foo your bar

使用原始命令输出（bar开始在行开始，因为它与回车匹配）：

bar:06:57.033] foo my
bar:06:57.033] foo your

Answer 1

首先，让我们记住

perl -ple's/foo bar(.*)\z/foo $1 bar/g' file.txt

是接近

的缩写

perl -e'
   while (<>) {
      chomp;
      s/foo bar(.*)\z/foo $1 bar/g;
      print $_, $/;
   }
' file.txt

Perl使得代码可以以独立于平台的方式读/写本地文本文件。

在评论中，您询问了如何以独立于平台的方式读取/写入本地文本文件和外部文本文件。

首先，您必须禁用Perl的正常处理。

binmode STDIN;
binmode STDOUT;

然后你必须处理多行结尾。

sub mychomp { (@_ ? $_[0] : $_) =~ s/(\s*)\z//; $1 }

while (<STDIN>) {
   my $le = mychomp($_);
   s/foo bar(.*)\z/foo $1 bar/g;
   print($_, $le);
}

所以而不是

perl -ple's/foo bar(.*)\z/foo $1 bar/g' file.txt

你会有

perl -e'
   sub mychomp { (@_ ? $_[0] : $_) =~ s/(\s*)\z//; $1 }

   binmode STDIN;
   binmode STDOUT;
   while (<STDIN>) {
      my $le = mychomp($_);
      s/foo bar(.*)\z/foo $1 bar/g;
      print($_, $le);
   }
' <file

Answer 2

在较新的perls中，您可以在正则表达式中使用\R来删除所有行尾字符（包括\n和\r）。请参阅perldoc perlre。

Answer 3

你可以说：

perl -pe 's/foo bar([^\015]*)(\015?\012)/foo $1 bar$2/g' *.txt

将保留行结尾，即与输入文件相同。

您可能还想引用perldoc perlport。

Answer 4

有没有办法让perl句柄自动进行特定于平台的行结束？

是。这实际上是默认值。

问题是您正在尝试在unix平台上处理Windows行结尾。

这肯定会这样做：

perl -pe'
    BEGIN {
       binmode STDIN,  ":crlf";
       binmode STDOUT, ":crlf";
    }
    s/foo bar(.*)$/foo $1 bar/g;
' <file.txt

我建议您继续手动操作吗？

或者，您可以将文件转换为文本文件并将其转换回来。

<file.orig dos2unix | perl -pe'...' | unix2dos >file.new

Answer 5

The \R escape sequence ^{Perl v5.10+; see perldoc rebackslash or the documentation online}, which matches "generic newlines" (platform-agnostically) can be made to work here (example uses Bash to create the multi-line input string):

$ printf 'foo barmy\r\nfoo baryour\r\n' | perl -pe 's/foo bar(.*?)\R/foo $1 bar\n/gm'
foo my bar
foo your bar

Note that the only difference to Ether's answer is use of a non-greedy construct (.*? rather than just .*), which makes all the difference here.

Read on, if you want to know more.

Background:

It is an example of a pitfall associated with \R, which stems from the fact that it can match one or two characters - either \r\n or, typically, \n:^[1]

With the greedy (.*) construct , "my\r" - including the \r - is captured, because the regex engine apparently only backtracks by one character to look for \R, which the remaining \n by itself also satisfies.

By contrast, using the non-greedy (.*?) construct causes \R to match the \r\n sequence, as intended.

^{[1] \R matches MORE than just \r\n and \n: it matches any single character that is classified as vertical whitespace in Unicode terms, which also includes \v (vertical tab), \f (form feed), \r (by itself), and the following Unicode chars: 0x133 (NEXT LINE), 0x2028 (LINE SEPARATOR), 0x8232 (LINE SEPARATOR) and 0x8233 (PARAGRAPH SEPARATOR)}

如何使perl单行＆＃34;行结尾不可知＆＃34;

5 个答案: