Question

我有一个带有标题行的混合字符分隔文件我正在尝试使用Text :: CSV读取，我已成功使用逗号分隔文件在其他脚本中插入哈希数组。我读过Text :: CSV不支持多个分隔符（空格，制表符，逗号），所以我在使用Text :: CSV之前尝试使用正则表达式清理行。更不用说数据文件在文件中间也有注释行。不幸的是，我没有管理权限来安装可以容纳多个sep_chars的库，所以我希望在添加到AoH之前我可以使用Text :: CSV或其他一些标准方法来清理标题和行。或者我应该放弃Text :: CSV？

我显然还在学习。提前致谢。

示例文件：

#
#
#
# name scale     address      type
test.data.one   32768       0x1234fde0      float
test.data.two   32768               0x1234fde4      float
test.data.the   32768       0x1234fde8      float
# comment lines in middle of data
test.data.for   32768                 0x1234fdec      float
test.data.fiv   32768       0x1234fdf0      float

代码摘录：

my $fh;
my $input;
my $header;
my $pkey;
my $row;
my %arrayofhashes;   

my $csv=Text::CSV({sep_char = ","})
    or die "Text::CSV error: " Text::CSV=error_diag;

open($fh, '<:encoding(UTF-8)', $input)
    or die "Can't open $input: $!";

while (<$fh>) {
    $line = $_;
    # skip to header row
    next if($line !~ /^# name/);
    # strip off leading chars on first column name
    $header =~ s/# //g;
    # replace multiple spaces and tabs with comma
    $header =~ s/ +/,/g;
    $header =~ s/t+/,/g;
    # results in $header = "name,scale,address,type"
    last;
}

my @header = split(",", $header);
$csv->parse($header);
$csv->column_names([$csv->fields]);
# above seems to work!

$pkey = 0;
while (<$fh>) {
    $line = $_;
    # skip comment lines
    next if ($line =~ /^#/);
    # replace spaces and tabs with commas
    $line =~ s/( +|\t+)/,/g;
    # replace multiple commas from previous regex with single comma    
    $line =~ s/,+/,/g;
    # results in $line = "test.data.one,32768,0x1234fdec,float"

    # need help trying to create a what I think needs to be a hash from the header and row.
    $row = ?????;
    # the following line works in my other perl scripts for CSV files when using:
    # while ($row = $csv->getline_hr($fh)) instead of the above.  
    $arrayofhashes{$pkey} = $row;
    $pkey++;
}

Answer 1

如果您的列由多个空格分隔，则Text :: CSV无用。您的代码包含大量重复代码，试图解决Text :: CSV限制。

此外，您的代码具有错误的样式，包含多个语法错误和拼写错误，以及混淆的变量名称。

所以你想解析一个标题。

我们需要为代码定义标题行。我们来看看“包含非空格字符的第一条评论行”。它之前可能没有非注释行。

use strict; use warnings; use autodie;

open my $fh, '<:encoding(UTF-8)', "filename.tsv";  # error handling by autodie

my @headers;
while (<$fh>) {
  # no need to copy to a $line variable, the $_ is just fine.
  chomp;                                     # remove line ending
  s/\A#\s*// or die "No header line found";  # remove comment char, or die
  /\S/ or next;                              # skip if there is nothing here
  @headers = split;                          # split the header names.
                                             # The `split` defaults to `split /\s+/, $_`
  last;                                      # break out of the loop: the header was found
}

\s字符类匹配空格字符（空格，制表符，换行符等）。 \S是反向的，匹配所有非空格字符。

休息

现在我们有了标题名称，可以进行正常的解析：

my @records;
while (<$fh>) {
  chomp;
  next if /\A#/;              # skip comments
  my @fields = split;
  my %hash;
  @hash{@headers} = @fields;  # use hash slice to assign fields to headers
  push @records, \%hash;      # add this hashref to our records
}

瞧。

结果

此代码从示例数据中生成以下数据结构：

@records = (
  {
    address => "0x1234fde0",
    name    => "test.data.one",
    scale   => 32768,
    type    => "float",
  },
  {
    address => "0x1234fde4",
    name    => "test.data.two",
    scale   => 32768,
    type    => "float",
  },
  {
    address => "0x1234fde8",
    name    => "test.data.the",
    scale   => 32768,
    type    => "float",
  },
  {
    address => "0x1234fdec",
    name    => "test.data.for",
    scale   => 32768,
    type    => "float",
  },
  {
    address => "0x1234fdf0",
    name    => "test.data.fiv",
    scale   => 32768,
    type    => "float",
  },
);

此数据结构可以像

一样使用

for my $record (@records) {
  say $record->{name};
}

或

for my $i (0 .. $#records) {
  say "$i: $records[$i]{name}";
}

对你的守则的批评

您将所有变量声明在脚本的顶部，有效地使它们成为全局变量。别。尽可能在最小的范围内创建变量。我的代码在外部范围中仅使用三个变量：$fh，@headers和@records。
此行my $csv=Text::CSV({sep_char = ","})无效。
- Text::CSV不是函数;它是模块的名称。你的意思是Text::CSV->new(...)。
- 选项应该是hashref，但sep_char = ","尝试将分配给sep_char遗憾的是，这可能是有效的语法。但实际上你想要指定一个键值关系。请改用=>运算符（称为 fat逗号或哈希火箭）。
这两项都不起作用：or die "Text::CSV error: " Text::CSV=error_diag。
- 要连接字符串，请使用.连接运算符。你写的是语法错误：文字字符串后面总是有一个操作符。
- 你真的很喜欢作业吗？ Text::CSV=error_diag不起作用。您打算在error_diag课程上调用Text::CSV方法。因此，请使用正确的运算符->：Text::CSV->error_diag。
替换s/t+/,/g用逗号代替t的所有序列。要替换制表符，请使用\t charclass。
%arrayofhashes不是散列数组：它是一个散列（由% sigil证明），但您使用整数作为键。数组具有@ sigil。
要在数组的末尾添加内容，我宁愿不将最后一项的索引保留在额外的变量中。而是使用push函数将项添加到最后。这减少了簿记代码的数量。
如果您发现自己编写了一个类似my $i = 0; while (condition) { do stuff; $i++}的循环，那么您通常希望拥有一个C风格的for循环：
```
for (my $i = 0; condition; $i++) {
  do stuff;
}
```
这也有助于适当确定变量的范围。

Perl解析多个分隔符字符数据

1 个答案:

所以你想解析一个标题。

休息

结果

对你的守则的批评