使用PHPExcel处理.xls获取OOM

时间:2017-03-31 14:17:36

标签: php excel memory phpexcel

我知道How to read large worksheets from large Excel files (27MB+) with PHPExcel?并且我已经尝试实施该问题中讨论的分块阅读,但我仍然遭受OOM错误的困扰。文件本身不到5Mb,9000+行(是的,它超过9000!),范围从A到V.

我希望用户在上传和处理之前不要对此文件进行任何编辑,因为目前它只是一个手动过程而且我喜欢完全用自动化替换它。该文件为xls格式,通过PHPExcel标识为Excel5。

我的PHP内存限制目前设置为128M,在Ubuntu Server上运行。

无论我设置什么样的块大小,我最终都会结束OOM。如果我将块大小设置为200,那么实际上运行得更好(例如,我可以管理到第7000行),当设置为1时,OOM在第370行附近。所以我相信&#39 ;东西'正在存储,或在块读取的每次迭代中加载到内存中,然后不再丢弃,最终导致OOM,但我无法看到这种情况发生在哪里。

我非常喜欢业余程序员,这只是我在工作中担任管理服务角色的一面,试图让我们的生活更轻松。

此代码的重点是阅读excel文件,过滤掉“废话”。然后将其保存为CSV(现在我只是将其转储到屏幕而不是CSV)。事情正在发生,我很想通过php脚本调用excel2csv然后尝试清理CSV而不是......但是当我可能更接近解决方案时,感觉就像放弃了。

<?php

error_reporting(E_ALL);
set_time_limit(0);
date_default_timezone_set('Europe/London');

require_once 'Classes/PHPExcel/IOFactory.php';

class chunkReadFilter implements PHPExcel_Reader_IReadFilter
{
        private $_startRow = 0;
        private $_endRow = 0;
        private $_columns = array();

        /**  Set the list of rows that we want to read  */
        public function setRows($startRow, $chunkSize, $columns) {
                $this->_startRow        = $startRow;
                $this->_endRow          = $startRow + $chunkSize;
                $this->_columns         = $columns;
        }
        public function readCell($column, $row, $worksheetName = '') {
                //  Only read the heading row, and the rows that are configured in $this->_startRow$
                if ($row >= $this->_startRow && $row < $this->_endRow) {
                        if(in_array($column,$this->_columns)) {
                                return true;
                        }
                }
                return false;
        }
}
$target_dir = "uploads/";
$file_name = $_POST["file_name"];

$full_path = $target_dir . $file_name;

echo "Processing ". $file_name . '; <br>';

ob_flush();
flush();


/** /** As files maybe large in memory, use a temp file to handle them
$cacheMethod = PHPExcel_CachedObjectStorageFactory::cache_to_phpTemp;
$cacheSettings = array( 'memoryCacheSize' => '8MB');
PHPExcel_Settings::setCacheStorageMethod($cacheMethod, $cacheSettings);
**/

$inputFileName = $full_path;

echo 'Excel reader started<br/>';

/** First we should get the type of file **/

$filetype = PHPExcel_IOFactory::identify($inputFileName);

echo 'File of type: ' . $filetype . ' found<br/>';

/** Load $inputFileName to a PHPExcel Object  - https://github.com/PHPOffice/PHPExcel/blob/develop/$


/**  Define how many rows we want to read for each "chunk"  **/
$chunkSize = 1;
/**  Create a new Instance of our Read Filter  **/
$chunkFilter = new chunkReadFilter();

$objReader = PHPExcel_IOFactory::createReader($filetype);

/**  Tell the Reader that we want to use the Read Filter that we've Instantiated  **/
$objReader->setReadFilter($chunkFilter);
/**  Loop to read our worksheet in "chunk size" blocks  **/
for ($startRow = 2; $startRow <= 65000; $startRow += $chunkSize) {
        $endRow = $startRow+$chunkSize-1;
        echo 'Loading WorkSheet using configurable filter for headings row 1 and for rows ',$startR$
        /**  Tell the Read Filter, the limits on which rows we want to read this iteration  **/
        $chunkFilter->setRows($startRow,$chunkSize,range('A','T'));
        /**  Load only the rows that match our filter from $inputFileName to a PHPExcel Object  **/
        $objPHPExcel = $objReader->load($inputFileName);
        //      Do some processing here
//      $sheetData = $objPHPExcel->getActiveSheet()->toArray(null,true,true,true);
        $sheetData = $objPHPExcel->getActiveSheet()->rangeToArray("A$startRow:T$endRow");
        var_dump($sheetData);
        // Clear the variable to not go over memory!
        $objPHPExcel->disconnectWorksheets();
        unset ($sheetData);
        unset ($objPHPExcel);
        ob_flush();
        flush();

        echo '<br /><br />';
}


/**  This loads the entire file,  crashing with OOM

try {
        $objPHPExcel = PHPExcel_IOFactory::load($inputFileName);
        echo 'loaded sheet into memory<br>';
} catch(PHPExcel_Reader_Exception $e) {
    die('Error loading file: '.$e->getMessage());
}

$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'CSV');

echo 'Saving sheet as CSV<br>';

    $objWriter->setSheetIndex(0);
    $objWriter->save('./uploads/'.$file_name.'.csv');
    echo 'Processed 1 sheet';
    ob_flush();
flush();

**/

echo "<body><table>\n\n";


/**
$f = fopen($file_name, "r");
while (($line = fgetcsv($f)) !== false) {
        echo "<tr>";
        foreach ($line as $cell) {
                echo "<td>" . htmlspecialchars($cell) . "</td>";
        }
        echo "</tr>\n";
}
fclose($f);
**/

echo "\n</table></body></html>";

?>

apache日志中指示的错误是:

[Fri Mar 31 15:35:27.982697 2017] [:error] [pid 1059] [client 10.0.2.2:53866] PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 45056 bytes) in /var/www/html/Classes/PHPExcel/Shared/OLERead.php on line 93, referer: http://localhost:8080/upload.php

1 个答案:

答案 0 :(得分:1)

unset ($objPHPExcel);

如果您检查PHPExcel documentation,由于电子表格,工作表和单元格之间存在循环引用,因此无法完全取消设置$ objPHPExcel,并且会导致内存泄漏。建议首先断开这些循环引用。

$objPHPExcel->disconnectWorksheets();
unset($objPHPExcel);

仍会有一些内存泄漏,但它应该允许在块之间释放更多内存