问题:为什么这个没有CFLF的最后一行最终没有被perl的split函数正确分割。它应该显示5个字段。
输入文件有这些行。最后一行没有CRLF(回车和换行)
1;P7;extended ascii ÀÇÊ;;
2;P8;non-ascii ΩЖאഉて;;
3;P8;non-ascii ΩЖאഉて;;
我的代码是,
use strict;
use warnings;
use Encode;
use utf8;
my $COL_SEP=';';
open FL, "<:encoding(UTF-16)", $ARGV[0] or die "canot open\n";
while(my $line= <FL>) {
chomp $line;
print "$. length=",length($line),"\n";
# print "\n",$line,"\n";
my @fields = split($COL_SEP, $line);
print "number of fields=",scalar(@fields),"\n";
}
close FL;
out put是
1 length=26
number of fields=5
2 length=23
number of fields=5
3 length=22
number of fields=3
答案 0 :(得分:4)
对于第一行,split
返回"1", "P7", "extended ascii ÀÇÊ", "", "\r"
对于第二行,split
返回"2", "P8", "non-ascii ΩЖאഉて", "", "\r"
对于第三行,split
将返回"3", "P8", "non-ascii ΩЖאഉて", "", ""
,但默认情况下split
会删除空的尾随字段。
更改
open(FL, "<:encoding(UTF-16)", $ARGV[0])
到
open(FL, "<:encoding(UTF-16):crlf", $ARGV[0])
如果您希望在Windows上也能正常工作,则需要
open(FL, "<:raw:encoding(UTF-16):crlf", $ARGV[0])
当然,你不应该使用全局变量,所以它应该是
open(my $FL, "<:raw:encoding(UTF-16):crlf", $ARGV[0])
更改
my @fields = split($COL_SEP, $line);
到
my @fields = split($COL_SEP, $line, -1);
当然,split
的第一个参数应该是一个正则表达式模式,所以应该是
my @fields = split(quotemeta($COL_SEP), $line, -1);
或
my @fields = split(qr/\Q$COL_SEP/, $line, -1);
或
my @fields = split(/\Q$COL_SEP/, $line, -1);