我需要读取一个10张xlsx文件,每张纸大约有3K行。
有没有一种方法可以循环每个工作表并将其行分块?
以下是我要讲的示例:
public function import($file)
{
$inputFileType = IOFactory::identify($file);
$reader = IOFactory::createReader($inputFileType);
//My ChunkReadFilter is exactly the same of the PhpSpreadsheet examples
$chunkFilter = new ChunkReadFilter();
$reader->setReadFilter($chunkFilter);
$chunkSize = 100;
$spreadsheet = $reader->load($file);
$loadedSheetNames = $spreadsheet->getSheetNames();
foreach ($loadedSheetNames as $sheetIndex => $loadedSheetName) {
$sheet = $spreadsheet->getSheet($sheetIndex);
//$highestRow = $sheet->getHighestRow(); //Is returning 1 as result
$highestRow = 3000;
for ($startRow = 1; $startRow <= $highestRow; $startRow += $chunkSize) {
/** Tell the Read Filter which rows we want this iteration **/
$chunkFilter->setRows($startRow, $chunkSize);
$sheetData = $sheet->toArray(null, true, false, true);
var_dump($sheetData);
}
}
}
var_dump($sheetData);
打印所有工作表数据,而不仅仅是块大小。
那么,我该如何读取每个工作表数据并对行进行分块?
我正在使用"phpoffice/phpspreadsheet": "^1.4"
答案 0 :(得分:1)
对此我仍然是陌生的,但是尝试了一个可以在这里为我们提供帮助的解决方案:
我们可以通过excel表读取上述评论中的Chunk文件,但可以节省内存。我们可以在循环内创建读取器,然后在循环结束时释放它,如下所述:
// Define how many rows we want to read for each "chunk"
$chunkSize = 1000;
// Loop to read our worksheet in "chunk size" blocks
for ($startRow = 1; $startRow <= $rawRows; $startRow += $chunkSize) {
// Create a new Reader of the type defined in
$reader = IOFactory::createReader($inputFileType);
// Create a new Instance of our Read Filter
$chunkFilter = new Chunk();
// Tell the Reader that we want to use the Read Filter that we've Instantiated
$reader->setReadFilter($chunkFilter);
// Tell the Read Filter, the limits on which rows we want to read this iteration
$chunkFilter->setRows($startRow, $chunkSize);
// Load only the rows that match our filter from $inputFileName to a PhpSpreadsheet Object
$spreadsheet = $reader->load($inputFileName);
.....
// process the file
.....
// then release the memory
$spreadsheet->__destruct();
$spreadsheet = null;
unset($spreadsheet);
$reader->__destruct();
$reader = null;
unset($reader);
}
它有助于大张纸仅使用大块内存,而不会超过内存限制。
请告诉我这是否有帮助。
答案 1 :(得分:0)
我完全错过了您的目标(问题不太清楚)。 我完全改变了我的答案。 假设您可以使用以下代码遍历多个工作表:
// .... add helper here....
$helper->log('Loading file ' . pathinfo($inputFileName, PATHINFO_BASENAME) . ' using IOFactory with a defined reader type of ' . $inputFileType);
$reader = IOFactory::createReader($inputFileType);
// Define how many rows we want for each "chunk"
$chunkSize = 10;
// Loop to read our worksheet in "chunk size" blocks
for ($startRow = 2; $startRow <= 50 ; $startRow += $chunkSize) {
// ..... use the helper ...
$helper->log('Loading WorkSheet using configurable filter for headings row 1 and for rows ' . $startRow . ' to ' . ($startRow + $chunkSize - 1));
// Create a new Instance of our Read Filter, passing in the limits on which rows we want to read
$chunkFilter = new ChunkReadFilter($startRow, $chunkSize);
// Tell the Reader that we want to use the new Read Filter that we've just Instantiated
$reader->setReadFilter($chunkFilter);
// Load only the rows that match our filter from $inputFileName to a PhpSpreadsheet Object
$spreadsheet = $reader->load($inputFileName);
$sheetCount = $spreadsheet->getSheetCount();
for ($i = 0; $i < $sheetCount; $i++) {
$sheet = $spreadsheet->getSheet($i);
// ...not what you want, but I leave this here
$higestRow = $sheet->getHighestRow();
echo "<p> Sheet n. ".$i. " highest row is:" . ($higestRow) . "</p>";
$sheetData = $sheet->toArray(null, true, true, true);
var_dump($sheetData);
}
}
...为了实现您的目标,我想您需要调用use PhpOffice\PhpSpreadsheet\Reader\IReadFilter;
并构建自己的过滤器,以便根据需要在for循环内设置highestRow。
此代码取自文档,即poblic函数setRows()
,我想这是您需要放置自己的代码的位置,然后在for
循环中计算过滤器:
namespace Samples\Sample12;
use PhpOffice\PhpSpreadsheet\IOFactory;
use PhpOffice\PhpSpreadsheet\Reader\IReadFilter;
require __DIR__ . '/../Header.php';
$inputFileType = 'Xls';
$inputFileName = __DIR__ . '/sampleData/example2.xls';
/** Define a Read Filter class implementing IReadFilter */
class ChunkReadFilter implements IReadFilter
{
private $startRow = 0;
private $endRow = 0;
/**
* Set the list of rows that we want to read.
*
* @param mixed $startRow
* @param mixed $chunkSize
*/
public function setRows($startRow, $chunkSize)
{
$this->startRow = $startRow;
$this->endRow = $startRow + $chunkSize;
}
public function readCell($column, $row, $worksheetName = '')
{
// Only read the heading row, and the rows that are configured in $this->_startRow and $this->_endRow
if (($row == 1) || ($row >= $this->startRow && $row < $this->endRow)) {
return true;
}
return false;
}
}
$helper->log('Loading file ' . pathinfo($inputFileName, PATHINFO_BASENAME) . ' using IOFactory with a defined reader type of ' . $inputFileType);
// Create a new Reader of the type defined in $inputFileType
$reader = IOFactory::createReader($inputFileType);
// Define how many rows we want to read for each "chunk"
$chunkSize = 10;
// Create a new Instance of our Read Filter
$chunkFilter = new ChunkReadFilter();
// Tell the Reader that we want to use the Read Filter that we've Instantiated
$reader->setReadFilter($chunkFilter);
$spreadsheet = $reader->load($inputFileName);
$sheetCount = $spreadsheet->getSheetCount();
for ($i = 0; $i < $sheetCount; $i++) {
$sheet = $spreadsheet->getSheet($i);
// ...we get the highest row here, now
$higestRow = $sheet->getHighestRow();
for ($startRow = 2; $startRow <= $higestRow; $startRow += $chunkSize) {
// ..just for check the output
echo "<p> Sheet n. ".$i. " highest row is:" . ($higestRow) . "</p>";
$helper->log('Loading WorkSheet using configurable filter for headings row 1 and for rows ' . $startRow . ' to ' . ($higestRow + $chunkSize - 1));
// Tell the Read Filter, the limits on which rows we want to read this iteration
$chunkFilter->setRows($startRow, $chunkSize);
// Load only the rows that match our filter from $inputFileName to a PhpSpreadsheet Object
$spreadsheet = $reader->load($inputFileName);
// Do some processing here
$sheetData = $spreadsheet->getActiveSheet()->toArray(null, true, true, true);
var_dump($sheetData);
}
}