来自csv文件的perl数组在意外的地方创建换行符

时间:2015-11-29 06:58:54

标签: perl

您好我有一些脚本将xlsx文件转换为制表符分隔文件,然后删除任何逗号,重复项,然后用逗号分隔。 (我这样做是为了确保用户没有在colomn中添加任何逗号)  然后我做了一些事情。然后将其转换回xlsx文件。这一直很好。但是,我不是一直打开和关闭文件,而是认为我会将文件推送到数组,然后在最后将其转换为xlsx。不幸的是,当我尝试转换回xlsx文件时,它正在名称之间的空格中创建换行符。如果我输出到csv文件然后打开它并转换为xlsx文件它工作正常。

#!/usr/bin/perl
use strict;
use warnings;
use Spreadsheet::BasicRead;
use Excel::Writer::XLSX;
local $" = "'\n'";      
open( STDERR, ">&STDOUT" );
#covert to csv 
my $xlsx_WSD = ( "C:\\Temp\\testing_file.xlsx"),, 1;
my @csvtemp;

        if ( -e $xlsx_WSD ) {
my $ss   = new Spreadsheet::BasicRead($xlsx_WSD) or die;
    my $col = '';
    my $row  = 0;
    while ( my $data = $ss->getNextRow() ) {
        $row++;
        $col= join( "\t", @$data );
            push @csvtemp,  $col . "\n" if ( $col ne "" );
    }
}   
        else {
            print "    C:\\Temp\\testing_file.xlsx file EXISTS ...!!\n";
            print "    please investigate and use the restore option if required !..\n";
    exit;
}
;
my @arraynew;
my %seen;
our $Header_row = shift (@csvtemp);  
  foreach (@csvtemp){
chomp;
     $_ =~ s/,//g;                                      
     $_ =~ s/\t/,/g;                                    

        #   print $_ . "\n" if !$seen{$_}++ ;
            push @arraynew, $_ . "\n" if !$seen{$_}++ ;    #remove any dupes

}


#covert back to  xlsx 
my $workbook  = Excel::Writer::XLSX->new("C:\\Temp\\testing_filet.xlsx");
my $worksheet = $workbook->add_worksheet();

my ( $x, $y ) = ( 0, 0 );
    while (<@arraynew>) {




my @list = split /,/;
        foreach my $c (@list) {
                        $worksheet->write( $x, $y++, $c );
    }
                        $x++;
                        $y = 0;
}



__DATA__

Animal  keeper  M/F Years   START DATE  FRH FSM
GIRAFFE JAMES LE    M   5   10/12/2007      Y
HIPPO   JACKIE LEAN F   6   11/12/2007      Y
ZEBRA   JAMES LEHERN    M   7   12/12/2007      Y
GIRAFFE AMIE CAHORT M   5   13/12/2012      Y
GIRAFFE MICKY JAMES M   5   14/06/2007      Y
MEERKAT JOHN JONES  M   9   15/12/2007  v   v
LEOPPARD    JIM LEE M   8   16/12/2002      


unexpected result 

GIRAFFE JAMES               
LE  M   5   10/12/2007      Y
"
HIPPO"  JACKIE              
LEAN    F   6   11/12/2007      Y
"
ZEBRA"  JAMES               
LEHERN  M   7   12/12/2007      Y
"
GIRAFFE"    AMIE                
CAHORT  M   5   13/12/2012      Y
"
GIRAFFE"    MICKY               
JAMES   M   5   14/06/2007      Y
"
MEERKAT"    JOHN                
JONES   M   9   15/12/2007  v   v
"
LEOPPARD"   JIM             
LEE M   8   16/12/2002

2 个答案:

答案 0 :(得分:1)

由于您在Windows上运行此功能,您是否考虑过使用Win32 :: OLE?

use strict;

use Win32::OLE;

my $app = Win32::OLE->GetActiveObject('Excel.Application')
        || Win32::OLE->new('Excel.Application', 'Quit');

my $wb = $app->Workbooks->Open("C:/Temp/testing_file.xlsx");

my $ws = $wb->ActiveSheet;

my $max_row = $ws->UsedRange->Rows->Count;
my $max_col = $ws->UsedRange->Columns->Count;

my ($row, %already) = (1);
while ($row <= $max_row) {

  my ($col, @output) = (1);

  while ($col <= $max_col) {
    my $val = $ws->Cells($row, $col)->{Text};

    if ($val =~ /[,\t]/) {
      $val =~ tr/,//d;
      $val =~ tr/\t/,/;
      $ws->Cells($row, $col)->{Value} = $val;
    }
    @output[$col - 1] = $val;
    $col++;
  }

  if ($already{join "|", @output}++) {
    $ws->Rows($row)->EntireRow->Delete;
    $max_row--;
  } else {
    $row++;    
  }
}

$wb->SaveAs("C:\\temp\\testing_filet.xlsx");

答案 1 :(得分:0)

这是行尾字符的问题。

标记行尾有三种约定:Unix上为\n,Windows上为\r\n,Mac上为\r。看起来你的脚本假定Mac约定,而输入和输出使用Windows约定。

因此,在阅读输入后,除了第一行之外的所有行都会出现前导\n。只要在使用\r编写输出行之前输出行也是这种情况,最终会得到一个带有完美\r\n - 分隔行的输出文件。显然,最好让你的脚本对输入使用的行结束约定保持警惕,并确保它使用相同的方法来分割行和组合输出。