Question

我正在尝试使用csv文件中的上述信息填写csv文件中的空白。

我有一个包含三列的CSV文件，名为Mb_size，tax_id和parent_id。 tax_id和parent_id之间存在关系，例如，在ms大小为22.2220658537的末尾的csv文件中，5820是税号，5819是父ID。随着向上移动文件5819，将在税收标识列中看到父标识。父ID可以重复，但税号在其列中是唯一的。

在csv文件的顶部，一些分类标识旁边有相应的MB大小。我希望将这些值传递给填补空白。因此，如果在它旁边有一个没有mb大小的分类标识，请使用父标识和分类标识关系从上面获取它。我正在尝试更改以前的脚本，但是我无法将关系编码。

示例输入文件：

Mb_size,tax_id,parent_id

377.810518214,1,1
377.810518214,131567,1
377.810518214,2759,131567
288.886032927,5819,2759
6565.2,999923,2759
466.7350035,147429,2759
22.2220658537,5820,5819
184.801317,4557,147429
,4575,147429
555.55,1234,5819
,4321,999923
,9999,4321

示例输出：

Mb_size,tax_id,parent_id
377.810518214,1,1
377.810518214,131567,1
377.810518214,2759,131567
288.886032927,5819,2759
6565.2,999923,2759
466.7350035,147429,2759
22.2220658537,5820,5819
184.801317,4557,147429
466.7350035,4575,147429
555.55,1234,5819
6565.2,4321,999923
6565.2,9999,4321

我的代码：用严格; 使用警告;

 open taxa_fh, '<', "$ARGV[0]" or die qq{Failed to open "$ARGV[1]" for input: $!\n};
 open match_fh, ">$ARGV[0]_passedDOWN.csv" or die qq{Failed to open for output: $!\n};

 my %node_data;
 my %parent;
 my @node_order;
 my $header;
 while ( my $line = <taxa_fh> ) {
 chomp( $line );

   if (1 == $.) {
      $header = $line;
      next; 
     }

     my @fields    = split( /,/, $line );
     my $Mb_size   = $fields[0] || 0; 
     my $tax_id    = $fields[1];
     my $parent_id = $fields[2];

  $parent{$tax_id} = $parent_id;
  push @node_order, $tax_id;
  $node_data{$tax_id} = $Mb_size;
}

 print match_fh "$header\n";
 for my $id ( @node_order ) {

  if ( exists $node_data{$tax_id} ) {
         print match_fh "$Mb_size, $id, " . $parent{$id} . "\n";

      } else {
         $parent = $parent{$parent}
     }

     }

    close taxa_fh;
    close match_fh;

Answer 1

perl -F, -lape '
    next if $. == 1;
    $F[0] = $size[$F[2]] if $F[0] eq "";
    $size[$F[1]] = $F[0];
    $_ = join ",", @F;
' input.file > output.file

我假设不存在空白大小的行也具有空白大小的父级的情况。

Perl，将丢失的csv文件传递给val。

1 个答案: