读取日期从xls到csv通过Perl

时间:2014-11-14 16:39:50

标签: string excel perl date csv

我有一批带有

等行的excel文件
1/13/04 21

我正在尝试将它们转换为.csv,但发现该行已转换为

36537,21

事实证明这是excel存储规则的副作用。 Excel应该将日期存储为自1900年1月1日以来的天数。根据该规则,这是一个错误的整数,对应于2001年1月12日而不是2004年1月13日(1/13/04表示的日期)。

  • Excel究竟如何犯这个错误?
  • 如何获取原始未格式化的值,并在此处回避转换?

这是代码的粗略草图:

my $xlsparser = Spreadsheet::ParseExcel->new();
my $xlsbook = $xlsparser->Parse('xls_test.xls');
my $xls = $xlsbook->{Worksheet}[0];
my $csv = '';

# then a loop over rows and columns with...
  my $cell = $xls->get_cell( $row, $col );
  $cellcon = $cell->unformatted();
  $csv .= $cellcon; 

如果我的论述不够清楚或者你无法重现这个问题,这里有一个最小的数据集和脚本可以为我重现:

https://dl.dropboxusercontent.com/u/58760/softwareGrr/xls_example.pl https://dl.dropboxusercontent.com/u/58760/softwareGrr/junk.xls

2 个答案:

答案 0 :(得分:1)

如果要将Excel日期序列值格式36537,21转换为perl中的时间/日期变量,则可以使用自己的函数转换日期。 功能下方

sub date2excelvalue {
  my($day1, $month, $year, $hour, $min, $sec) = @_;
  my @cumul_d_in_m = (0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 365);
  my $doy = $cumul_d_in_m[$month - 1] + $day1;

  #
  full years + your day
  for my $y(1900..$year) {
    if ($y == $year) {
      if ($month <= 2) {

        #
        dont add manually extra date
        if inJanuary or February
        last;
      }
      if ((($y % 4 == 0) && ($y % 100 != 0)) || ($y % 400 == 0) || ($y == 1900)) {
        $doy++;#
        leap year
      }
    } else {#
      full years
      $doy += 365;
      if ((($y % 4 == 0) && ($y % 100 != 0)) || ($y % 400 == 0) || ($y == 1900)) {
        $doy++;#
        leap year
      }

    }
  }#
  end
  for y# calculate second parts as a fraction of 86400 seconds
  my $excel_decimaltimepart = 0;
  my $total_seconds_from_time = ($hour * 60 * 60 + $min * 60 + $sec);
  if ($total_seconds_from_time == 86400) {
    $doy++;#
    just add a day
  } else {#
    add decimal in excel
    $excel_decimaltimepart = $total_seconds_from_time / (86400);
    $excel_decimaltimepart = ~s / 0\. //;
  }
  return "$doy\.$excel_decimaltimepart";

}

sub excelvalue2date {
  my($excelvalueintegerpart, $excelvaluedecimalpart) = @_;
  my @cumul_d_in_m = (0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 365);
  my @cumul_d_in_m_leap = (0, 31, 60, 91, 121, 152, 182, 213, 244, 274, 305, 335, 366);
  my @cumul_d_in_m_selected;
  my($day1, $month, $year, $hour, $min, $sec);
  $day1 = 0;#
  all days all years
  my $days_in_year;
  my $acumdays_per_month;
  my $daysinmonth;
  my $day;

  #
  full years + your day
  for my $y(1900. .3000) {
    my $leap_year = 0;#
    leap year
    my $leap_year_mask = 0;#
    leap year
    if ((($y % 4 == 0) && ($y % 100 != 0)) || ($y % 400 == 0) || ($y == 1900)) {
      $leap_year = 1;#
      leap year
      @cumul_d_in_m_selected = @cumul_d_in_m_leap;

    } else {
      $leap_year = 0;#
      leap year
      @cumul_d_in_m_selected = @cumul_d_in_m;
    }

    if (($day1 + (365 + $leap_year)) > $excelvalueintegerpart) {

      #
      found this year $y
      $year = $y;
      print "year $y\n";

      $days_in_year = $excelvalueintegerpart - $day1;
      $acumdays_per_month = 0;
      print "excelvalueintegerpart  $excelvalueintegerpart\n";
      print "day1  $day1\n";
      print "daysinyear $days_in_year\n";
      for my $i(0..$# cumul_d_in_m) {
        if ($i == $# cumul_d_in_m) {
          $month = $i + 1;#
          month 12 December
          $day = $days_in_year - $cumul_d_in_m_selected[$i];
          last;

        } else {

          if (($days_in_year > ($cumul_d_in_m_selected[$i])) && ($days_in_year <= ($cumul_d_in_m_selected[$i + 1]))) {
            $month = $i + 1;
            $day = $days_in_year - $cumul_d_in_m_selected[$i];
            last;
          }

        }

      }#
      end
      for $i months

      # end year
      last;

    } else {#
      full years
      $day1 += (365 + $leap_year);
    }

  }#
  end
  for years interger part comparator

  my $total_seconds_inaday;
  $total_seconds_inaday = "0\.$excelvaluedecimalpart" * 86400;

  $sec = $total_seconds_inaday;
  $hour = int($sec / (60 * 60));
  $sec -= $hour * (60 * 60);
  $min = int($sec / 60);
  $sec -= $min * (60);
  $sec = int($sec);
  return ($day, $month, $year, $hour, $min, $sec);

}
my $excelvariable = date2excelvalue(1, 3, 2018, 14, 14, 30);
print "Excel variable: $excelvariable\n";
my($integerpart, $decimalwithoutzero) = ($1, $2) if ($excelvariable = ~m / (\d + )\.(\d + ) / );
my($day1, $month, $year, $hour, $min, $sec) = excelvalue2date($integerpart, $decimalwithoutzero);
print "Excel Date from value: $day1, $month, $year, $hour, $min, $sec\n";

享受吧!

答案 1 :(得分:0)

有问题的一行是

$cellcon = $cell->unformatted();

除非有人能提供更好的解释,否则我会将此视为错误。我替换的行是

$cellcon = $cell->Value;