PhpSpreadsheet-在多张纸上分块数据

时间:2018-08-28 13:41:46

标签: php phpexcel phpspreadsheet

我需要读取一个10张xlsx文件,每张纸大约有3K行。

有没有一种方法可以循环每个工作表并将其行分块?

以下是我要讲的示例:

public function import($file)
{
    $inputFileType = IOFactory::identify($file);
    $reader = IOFactory::createReader($inputFileType);

    //My ChunkReadFilter is exactly the same of the PhpSpreadsheet examples
    $chunkFilter = new ChunkReadFilter();
    $reader->setReadFilter($chunkFilter);

    $chunkSize = 100;

    $spreadsheet = $reader->load($file);

    $loadedSheetNames = $spreadsheet->getSheetNames();

    foreach ($loadedSheetNames as $sheetIndex => $loadedSheetName) {
        $sheet = $spreadsheet->getSheet($sheetIndex);

        //$highestRow = $sheet->getHighestRow(); //Is returning 1 as result
        $highestRow = 3000;

        for ($startRow = 1; $startRow <= $highestRow; $startRow += $chunkSize) {
            /**  Tell the Read Filter which rows we want this iteration  **/
            $chunkFilter->setRows($startRow, $chunkSize);

            $sheetData = $sheet->toArray(null, true, false, true);
            var_dump($sheetData);
        }

    }
}

var_dump($sheetData);打印所有工作表数据,而不仅仅是块大小。

那么,我该如何读取每个工作表数据并对行进行分块?

我正在使用"phpoffice/phpspreadsheet": "^1.4"

2 个答案:

答案 0 :(得分:1)

对此我仍然是陌生的,但是尝试了一个可以在这里为我们提供帮助的解决方案:

我们可以通过excel表读取上述评论中的Chunk文件,但可以节省内存。我们可以在循环内创建读取器,然后在循环结束时释放它,如下所述:

// Define how many rows we want to read for each "chunk"
$chunkSize = 1000;      

// Loop to read our worksheet in "chunk size" blocks
for ($startRow = 1; $startRow <= $rawRows; $startRow += $chunkSize) {
// Create a new Reader of the type defined in
$reader = IOFactory::createReader($inputFileType);

// Create a new Instance of our Read Filter
$chunkFilter = new Chunk();

// Tell the Reader that we want to use the Read Filter that we've Instantiated
$reader->setReadFilter($chunkFilter);

// Tell the Read Filter, the limits on which rows we want to read this iteration
$chunkFilter->setRows($startRow, $chunkSize);
// Load only the rows that match our filter from $inputFileName to a PhpSpreadsheet Object
$spreadsheet = $reader->load($inputFileName);
.....
// process the file
.....

// then release the memory
$spreadsheet->__destruct();
$spreadsheet = null;
unset($spreadsheet);

$reader->__destruct();
$reader = null;
unset($reader);
}

它有助于大张纸仅使用大块内存,而不会超过内存限制。

请告诉我这是否有帮助。

答案 1 :(得分:0)

我完全错过了您的目标(问题不太清楚)。 我完全改变了我的答案。 假设您可以使用以下代码遍历多个工作表:

// .... add helper here....
$helper->log('Loading file ' . pathinfo($inputFileName, PATHINFO_BASENAME) . ' using IOFactory with a defined reader type of ' . $inputFileType);
$reader = IOFactory::createReader($inputFileType);

// Define how many rows we want for each "chunk"
$chunkSize = 10;

// Loop to read our worksheet in "chunk size" blocks
for ($startRow = 2; $startRow <= 50 ; $startRow += $chunkSize) {
    // ..... use the helper ...
    $helper->log('Loading WorkSheet using configurable filter for headings row 1 and for rows ' . $startRow . ' to ' . ($startRow + $chunkSize - 1));
    // Create a new Instance of our Read Filter, passing in the limits on which rows we want to read
    $chunkFilter = new ChunkReadFilter($startRow, $chunkSize);
    // Tell the Reader that we want to use the new Read Filter that we've just Instantiated
    $reader->setReadFilter($chunkFilter);
    // Load only the rows that match our filter from $inputFileName to a PhpSpreadsheet Object
    $spreadsheet = $reader->load($inputFileName);

    $sheetCount = $spreadsheet->getSheetCount();

    for ($i = 0; $i < $sheetCount; $i++) {
        $sheet = $spreadsheet->getSheet($i);

        // ...not what you want, but I leave this here
        $higestRow = $sheet->getHighestRow();
        echo "<p> Sheet n. ".$i. "  highest row is:" . ($higestRow) . "</p>";

        $sheetData = $sheet->toArray(null, true, true, true);

        var_dump($sheetData);
    }
}

...为了实现您的目标,我想您需要调用use PhpOffice\PhpSpreadsheet\Reader\IReadFilter;并构建自己的过滤器,以便根据需要在for循环内设置highestRow。 此代码取自文档,即poblic函数setRows(),我想这是您需要放置自己的代码的位置,然后在for循环中计算过滤器:

namespace Samples\Sample12;

use PhpOffice\PhpSpreadsheet\IOFactory;
use PhpOffice\PhpSpreadsheet\Reader\IReadFilter;

require __DIR__ . '/../Header.php';

$inputFileType = 'Xls';
$inputFileName = __DIR__ . '/sampleData/example2.xls';

/**  Define a Read Filter class implementing IReadFilter  */
class ChunkReadFilter implements IReadFilter
{
    private $startRow = 0;

    private $endRow = 0;

/**
 * Set the list of rows that we want to read.
 *
 * @param mixed $startRow
 * @param mixed $chunkSize
 */
public function setRows($startRow, $chunkSize)
{
    $this->startRow = $startRow;
    $this->endRow = $startRow + $chunkSize;
}

public function readCell($column, $row, $worksheetName = '')
{
    //  Only read the heading row, and the rows that are configured in            $this->_startRow and $this->_endRow
    if (($row == 1) || ($row >= $this->startRow && $row <   $this->endRow)) {
        return true;
    }

    return false;
    }
}

$helper->log('Loading file ' . pathinfo($inputFileName, PATHINFO_BASENAME) . ' using IOFactory with a defined reader type of ' . $inputFileType);
// Create a new Reader of the type defined in $inputFileType
$reader = IOFactory::createReader($inputFileType);

// Define how many rows we want to read for each "chunk"
$chunkSize = 10;
// Create a new Instance of our Read Filter
$chunkFilter = new ChunkReadFilter();

// Tell the Reader that we want to use the Read Filter that we've  Instantiated
$reader->setReadFilter($chunkFilter);

$spreadsheet = $reader->load($inputFileName);

$sheetCount = $spreadsheet->getSheetCount();

for ($i = 0; $i < $sheetCount; $i++) {
    $sheet = $spreadsheet->getSheet($i);
    // ...we get the highest row here, now
    $higestRow = $sheet->getHighestRow();

    for ($startRow = 2; $startRow <= $higestRow; $startRow += $chunkSize) {
        // ..just for check the output
        echo "<p> Sheet n. ".$i. "  highest row is:" . ($higestRow) . "</p>";
        $helper->log('Loading WorkSheet using configurable filter for headings row 1 and for rows ' . $startRow . ' to ' . ($higestRow + $chunkSize - 1));
        // Tell the Read Filter, the limits on which rows we want to read this iteration
        $chunkFilter->setRows($startRow, $chunkSize);
        // Load only the rows that match our filter from $inputFileName to a PhpSpreadsheet Object
        $spreadsheet = $reader->load($inputFileName);

        // Do some processing here

        $sheetData = $spreadsheet->getActiveSheet()->toArray(null, true, true, true);
        var_dump($sheetData);
    }

}