Question

我在Windows中运行perl并且我有一些文本文件，其中CRLF（0d0a）中的行。问题是，文件周围偶尔会出现0个字符，这些字符在Windows perl中分割线条并且与我的处理相混淆。我的想法是预处理文件，阅读由CRLF分割的行，但至少在Windows中，它仍坚持拆分LF。

我已尝试设置$ /

local $/ = 0x0d; 
open(my $fh, "<", $file) or die "Unable to open $file";
while (my $line = <$fh>) {
    # do something to get rid of the 0x0a embedded in the line of text; 
}

...但这会读取多行...它似乎完全错过了0x0d。我还尝试将其设置为＆＃34; \ n＆＃34;，＆＃34; \ n \ r＆＃34;，＆＃34; \ r＆＃34;和＆＃34; \ r \ n＆＃34;。必须有一个简单的方法来做到这一点！

我需要摆脱它，所以我可以正确处理文件。因此，我需要一个打开文件的脚本，在CRLF上拆分文件，找到任何不是0d的0a，然后将其逐行保存到新文件中。

感谢您提供的任何帮助。

Answer 1

此解决方案的工作原理是使用二进制模式读取数据。

open(my $INFILE, "<:raw", $infile)
    or die "Can't open \"$infile\": $!\n");
open(my $OUTFILE, ">:raw", $outfile)
    or die "Can't create \"$outfile\": $!\n");

my $buffer = '';
while (sysread($INFILE, $buffer, 4*1024*1024)) {
    $buffer =~ s/(?<!\x0D)\x0A//g;

    # Keep one char in case we cut between a CR and a LF.
    print $OUTFILE substr($buffer, 0, -1, '');
}

print $OUTFILE $buffer;

Answer 2

对于初学者，local $/ = 0x0d;应为local $/ = "\x0d";。

除此之外，问题是默认情况下会在Windows中的文件句柄中添加:crlf图层。这会导致CRLF在读取时转换为LF（反之亦然）。因此，您阅读的内容中没有CR，因此您最终会阅读整个文件。

只需删除/禁用:crlf即可。

local $/ = "\x0D\x0A";
open(my $fh, "<:raw", $file)
    or die("Can't open \"$file\": $!\n");

while (<$fh>) {
    chomp;
    s/\x0A//g;
    say;
}

需要一个脚本来从文本文件中删除额外的换行符

2 个答案: