加载包含250万条记录的CSV文件

时间:2014-09-29 14:53:22

标签: php arrays csv import

我有一个包含英国每个邮政编码的大型CSV文件,它有2,558,797个记录,我需要导入它,在将数据保存到多维数组之前将数据排序为多维数组来操作数据在数据库中。

问题是,如果我尝试访问整个文件,我会获得允许的内存超出异常。我可以随时访问大约128,000条记录。有没有办法可以拆分任务,以便我可以处理整个文件?我曾尝试查看fseek,但是它使用的是字节数,而不是行数,而且我不知道128,000行的字节数。

如何在不超出内存限制的情况下处理整个文件?我在过去的6个小时里一直试图让这个工作起来,而且我没有任何快乐。

到目前为止,这是我的代码:

    // This script takes a long time to run
ini_set('max_execution_time', 300);

// First we need to verify the files that have been uploaded.
$file = Validation::factory($_FILES);
$file->rule('import_update_file', 'Upload::not_empty');
$file->rule('import_update_file', 'Upload::valid');
$file->rule('import_update_file', 'Upload::size', array(':value', '8M'));
$file->rule('import_update_file', 'Upload::type', array(':value', array('zip')));
if (Request::current()->method() == Request::POST && $file->check())
{
    $file_name = date('Y-m-d-').'update.zip';
    $dir = Upload::save($file['import_update_file'], $file_name);
    if ($dir === false)
    {
        throw new Kohana_Exception('Unable to save uploaded file!', NULL, 1);
    }
    $zip = new ZipArchive;
    if ($zip->open($dir) !== TRUE)
    {
        throw new Kohana_Exception('Unable to open uploaded zip file! Error: '.$res, NULL, 1);
    }
    $zip->extractTo(realpath(Upload::$default_directory), array('localauthority.csv', 'postcode.csv'));
    $zip->close();

    if( ! file_exists(realpath(Upload::$default_directory).DIRECTORY_SEPARATOR.'localauthority.csv') OR 
        ! file_exists(realpath(Upload::$default_directory).DIRECTORY_SEPARATOR.'postcode.csv'))
    {
        throw new Kohana_Exception('Missing file from uploaded zip archive! Expected localauthority.csv and postcode.csv', NULL, 1);
    }
    $local_authorities = Request::factory('local_authority/read')->execute();

    // We start by combining the files, sorting the postcodes and local authority names under the local authority codes.
    $update = array();
    if (($fp = fopen(realpath(Upload::$default_directory).DIRECTORY_SEPARATOR.'localauthority.csv', 'r')) === FALSE)
    {
        throw new Kohana_Exception('Unable to open localauthority.csv file.', NULL, 1);
    }
    while (($line = fgetcsv($fp)) !== FALSE)
    {
        // Column 0 = Local Authority Code
        // Column 1 = Local Authority Name
        $update[$line[0]] = array(
            'name'      => $line[1],
            'postcodes' => array()
        );
    }
    fclose($fp);
    unlink(realpath(Upload::$default_directory).DIRECTORY_SEPARATOR.'localauthority.csv');

    if (($fp = fopen(realpath(Upload::$default_directory).DIRECTORY_SEPARATOR.'postcode.csv', 'r')) === FALSE)
    {
        throw new Kohana_Exception('Unable to open postcode.csv file.', NULL, 1);
    }
    $i = 1;
    while (($line = fgetcsv($fp)) !== FALSE && $i <= 128000)
    {
        $postcode = trim(substr($line[0], 0, 4));
        echo "Line ".sprintf("%03d", $i++) . ": Postcode: ".$line[0]."; Shortened Postcode: ".$postcode."; LAC: ".$line[1]."<br>";
        // Column 0 = Postcode
        // Column 1 = Local Authority Code
        if ( ! array_key_exists($line[1], $update))
        {
            echo $line[1]." not in array<br>";
            continue;
        }
        if ( ! in_array($postcode, $update[$line[1]]['postcodes']))
        {
            $update[$line[1]]['postcodes'][] = $postcode;
        }
    }
    fclose($fp);
    unlink(realpath(Upload::$default_directory).DIRECTORY_SEPARATOR.'postcode.csv');

    echo '<pre>'; var_dump($update); echo '</pre>';
}
else
{
    throw new Kohana_Exception('Invalid file uploaded!', NULL, 1);
}

0 个答案:

没有答案