是否可以使用PHPExcel库导入和导出大小为70MB的excel文件?

时间:2015-06-03 13:05:54

标签: php phpexcel

我有一个包含3列的excel文件,其中第2列包含电子邮件超链接。所以我必须导入这个文件并导出它只有2列,首先应该包含名称,第二个电子邮件意味着我必须将超链接分成名称和电子邮件。

对于31MB文件,我在php.ini文件中将内存限制更改为2048MB,执行时间为1200。我可以成功导入和导出31MB的excel文件,但导出70MB文件执行需要花费很多时间并给出以下错误信息。

致命错误:第327行的/var/www/html/PHPExcel/Reader/Excel2007.php中允许的内存大小为2147483648字节(尝试分配15667514字节)

是否可以使用PHPExcel库导入和导出大小为70MB的excel文件?我需要在php.ini文件中更改内存限制和最大执行时间等。

require "PHPExcel.php";
require "PHPExcel/IOFactory.php";

$inputFileName = 'xxx.xlsx';

    $inputFileType = PHPExcel_IOFactory::identify($inputFileName);
    $objReader = PHPExcel_IOFactory::createReader($inputFileType);
    $objReader->setReadDataOnly(true);
    $objPHPExcel = $objReader->load($inputFileName);

    $outputObj = new PHPExcel();

//  Get worksheet dimensions
$sheet = $objPHPExcel->getSheet(0);
$highestRow = $sheet->getHighestRow();

$outputObj->setActiveSheetIndex(0);
$outSheet = $outputObj->getActiveSheet();

//  Loop through each row of the worksheet in turn
for ($row = 2; $row <= $highestRow; $row++){ // As row 1 seems to be header
    //  Read cell B2, B3, etc.
    $line = $sheet->getCell('B' . $row)->getValue();

    preg_match("|([^\.]+)\ <([^>]+)>|", $line, $data);

    if(!empty($data))
    {
        // $data[1] will be name & $data[2] will be email
        $outSheet->setCellValue('A' . $row, $data[1]);
        $outSheet->setCellValue('B' . $row, $data[2]);  
    }

}

$objWriter = new PHPExcel_Writer_CSV($outputObj);
$objWriter->save("xxx.csv");

注意:我可以导出excel文件而不对php.ini文件进行任何更改

3 个答案:

答案 0 :(得分:7)

我得到了解决方案。我成功地在python中完成了这个任务。希望它会帮助某人。 :)

# Time taken 2min 4sec for 69.9MB file.

import csv
import re
from openpyxl import Workbook, load_workbook

location = 'big.xlsx'

wb = load_workbook(filename=location, read_only=True)
users_data = []
# pattern = '^(.+?) <([^>].+)>$' # matches "your name <email@email.com>"
# pattern_new = '^(.+?)<([^>].+)>$' # matches "your name<email@email.com>"
# pattern_email = '([\w.-]+@[\w.-]+)' # extracts email from sentence

# Define patterns to check on string.
patterns = ['^(.+?) <([^>].+)>$', '^(.+?)<([^>].+)>$']

# Loop through all sheets in XLSX
for wsheet in wb.get_sheet_names():
    # Load data from Sheet.
    ws = wb.get_sheet_by_name(wsheet)
    # Loop through each row in current Sheet.
    for row in ws.rows:
        # We need column B data, so get that directly.
        # Check if its not empty.
        if row[1].value:
            val = ""
            # Get column B data, remove unnecessary data and encode using utf-8 format.
            data = row[1].value.replace("(at)", "@").replace("(dot)", ".").encode('utf-8')
            # Loop through all patterns to match in current data.
            for pattern in patterns:
                # Apply regex on data.
                name_data = re.search(pattern, data)
                # If match found.
                if name_data:
                    # Create list of matched data and break loop to avoid extra searches on current row.
                    val = [name_data.group(1), name_data.group(2)]
                    # val = name_data.group()
                    break
            # If no matches found, check for only email, if not then use data as it is.
            if not val:
                # val = data
                name_data = re.search('([\w.-]+@[\w.-]+)', data)
                # If match found, then use that, else use data.
                if name_data:
                    val = [name_data.group(1)]
                else:
                    val = data
            # Append new data to users_data array.
            users_data.append(val)

# Open CSV file for writting list.
myfile = open('big.csv', 'wb')


# Open file in write mode.
wr = csv.writer(myfile, dialect='excel', delimiter = ',', quotechar='"', quoting=csv.QUOTE_MINIMAL, lineterminator='\n')
# Loop through each value in list.
for word in users_data:
    # Append data in CSV.
    wr.writerow([word])

# Close CSV file.
myfile.close()

答案 1 :(得分:2)

@Priyanka,您也可以尝试使用Spout:https://github.com/box/spout。它适用于大文件!您不必更改php.ini文件,因为它不需要超过10MB的内存,应该在默认时间限制之前完成。

您可以这样做:

$filePath = 'xxx.xlsx';
$reader = ReaderFactory::create(Type::XLSX);
$reader->open($filePath);

$writer = WriterFactory::create(Type::CSV);
$writer->openToFile($'xxx.csv');

$rowCount = 0;
while ($reader->hasNextSheet()) {
    $reader->nextSheet();

    while ($reader->hasNextRow()) {
        $row = $reader->nextRow();
        $rowCount++;

        if ($rowCount === 1) {
            continue; // that's for the header row
        }

        // get the values you need in the current row
        // for example:
        $name = $row[1];
        $email = $row[2];

        // write the data to the CSV file
        $writer->addRow([$name, $email]);
    }
}

$reader->close();
$writer->close();

试一试!希望它能解决你的问题:)

答案 2 :(得分:1)

我没有看到加载一个电子表格文件的重点,将其中的所有内容复制到一秒,然后保存第二个....这将是内存和性能密集型

为什么不加载第一个,删除标题行1,然后保存到CSV输出

// Read the original spreadsheet
$inputFileName = 'TraiDBDump.xlsx';

$inputFileType = PHPExcel_IOFactory::identify($inputFileName);
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$objReader->setReadDataOnly(true);
$objPHPExcel = $objReader->load($inputFileName);

// Remove header row
$objPHPExcel->getSheet(0)->removeRow(1, 1);

// Save as a csv file
$objWriter = new PHPExcel_Writer_CSV($objPHPExcel);
$objWriter->save("TraiDBDump.csv");

如果您的原始列有很多列,并且您只需要A和B,那么您可以使用读取过滤器来只读取这两列