用PHP读取大型CSV文件

时间:2010-05-26 15:42:53

标签: php csv

我有一个非常大的CSV文件。确切地说是51427行。

有没有办法只能将所需的行读入数组?这会大大加快速度。

7 个答案:

答案 0 :(得分:2)

您可以直接连接到数据库服务器吗?

如果是这样,我会考虑使用像SQLyog这样的第三方程序来导入你的csv。

您也可以上传文件并使用mysql shell直接导入该数据:

LOAD DATA INFILE '/path/to/your_file.csv' INTO TABLE table_name FIELDS TERMINATED BY ',';

答案 1 :(得分:2)

  

这会将整个CSV文件读入数组

所有50000多行?

通过逐行读取(fgets())从PHP推进到文件所需块的开始,然后将每个(所需)行添加到数组中;你可以使用fgetcsv()来获取该行的数组。

编辑:我不知道确切的细节,但我觉得将所有内容都读入数据结构的成本远远超过只阅读我们需要的内容。

答案 2 :(得分:2)

您可能希望查看流式传输csv文件。发送起始文件位置,起始位置和要读取的字节数作为获取参数到ProgressiveReader.php

class NoFileFoundException extends Exception {
    function __toString() {
        return '<h1><b>ERROR:</b> could not find ('
                    .$this->getMessage().
                    ') please check your settings.</h1>';
    }
}

class NoFileOpenException extends Exception {
    function __toString() {
        return '<h1><b>ERROR:</b> could not open ('
                    .$this->getMessage().
                    ') please check your settings.</h1>';
    }
}

interface Reader {
    function setFileName($fName);
    function open();
    function setBufferOffset($offset);
    function bufferSize();
    function isOffset();
    function setPacketSize($size);
    function read();
    function isEOF();
    function close();
    function readAll();
}

class ProgressiveReader implements Reader {
    private $fName;
    private $fileHandler;
    private $offset = 0;
    private $packetSize = 0;

    public function setFileName($fName) {
        $this->fName = $fName;
        if(!file_exists($this->fName)) {
            throw new NoFileFoundException($this->fName);
        }
    }

    public function open() {
        try {
            $this->fileHandler = fopen($this->fName, 'rb');
        }
        catch (Exception $e) {
            throw new NoFileOpenException($this->fName);
        }
        fseek($this->fileHandler, $this->offset);
    }

    public function setBufferOffset($offset) {
        $this->offset = $offset;
    }

    public function bufferSize() {
        return filesize($this->fName) - (($this->offset > 0) ? ($this->offset  + 1) : 0);
    }

    public function isOffset() {
        if($this->offset === 0) {
            return false;
        }
        return true;
    }

    public function setPacketSize($size) {
        $this->packetSize = $size;
    }

    public function read() {
        return fread($this->fileHandler, $this->packetSize);
    }

    public function isEOF() {
        return feof($this->fileHandler);
    }

    public function close() {
        if($this->fileHandler) {
            fclose($this->fileHandler);
        }
    }

    public function readAll() {
        return fread($this->fileHandler, filesize($this->fName));
    }
}

以下是单元测试:

require_once 'PHPUnit/Framework.php';

require_once dirname(__FILE__).'/../ProgressiveReader.php';

class ProgressiveReaderTest extends PHPUnit_Framework_TestCase {

    protected $reader;
    private $fp;
    private $fname = "Test.txt";

    protected function setUp() {
        $this->createTestFile();
        $this->reader = new ProgressiveReader();
    }

    protected function tearDown() {
        $this->reader->close();
    }

    public function test_isValidFile() {
        $this->reader->setFileName($this->fname);
    }

    public function test_isNotValidFile() {
        try {
            $this->reader->setFileName("nothing.tada");
        }
        catch (Exception $e) {
            return;
        }

        $this->fail();
    }

    public function test_isFileOpen() {
        $this->reader->setFileName($this->fname);
        $this->reader->open();
    }

    public function test_couldNotOpenFile() {
        $this->reader->setFileName($this->fname);
        try {
            $this->deleteTestFile();
            $this->reader->open();
        }
        catch (Exception $e) {
            return;
        }

        $this->fail();
    }

    public function test_bufferSizeZeroOffset() {
        $this->reader->setFileName($this->fname);
        $this->reader->open();
        $this->assertEquals($this->reader->bufferSize(), 12);
    }

    public function test_bufferSizeTwoOffset() {
        $this->reader->setFileName($this->fname);
        $this->reader->setBufferOffset(2);
        $this->reader->open();
        $this->assertEquals($this->reader->bufferSize(), 9);
    }

    public function test_readBuffer() {
        $this->reader->setFileName($this->fname);
        $this->reader->setBufferOffset(0);
        $this->reader->setPacketSize(1);
        $this->reader->open();
        $this->assertEquals($this->reader->read(), "T");
    }

    public function test_readBufferWithOffset() {
        $this->reader->setFileName($this->fname);
        $this->reader->setBufferOffset(2);
        $this->reader->setPacketSize(1);
        $this->reader->open();
        $this->assertEquals($this->reader->read(), "S");
    }

    public function test_readSuccesive() {
        $this->reader->setFileName($this->fname);
        $this->reader->setBufferOffset(0);
        $this->reader->setPacketSize(6);
        $this->reader->open();
        $this->assertEquals($this->reader->read(), "TEST1\n");
        $this->assertEquals($this->reader->read(), "TEST2\n");
    }

    public function test_readEntireBuffer() {
        $this->reader->setFileName($this->fname);
        $this->reader->open();
        $this->assertEquals($this->reader->readAll(), "TEST1\nTEST2\n");
    }

    public function test_isNotEOF() {
        $this->reader->setFileName($this->fname);
        $this->reader->setBufferOffset(2);
        $this->reader->setPacketSize(1);
        $this->reader->open();
        $this->assertFalse($this->reader->isEOF());
    }

    public function test_isEOF() {
        $this->reader->setFileName($this->fname);
        $this->reader->setBufferOffset(0);
        $this->reader->setPacketSize(15);
        $this->reader->open();
        $this->reader->read();
        $this->assertTrue($this->reader->isEOF());
    }

    public function test_isOffset() {
        $this->reader->setFileName($this->fname);
        $this->reader->setBufferOffset(2);
        $this->assertTrue($this->reader->isOffset());
    }

    public function test_isNotOffset() {
        $this->reader->setFileName($this->fname);
        $this->assertFalse($this->reader->isOffset());
    }

    private function createTestFile() {
        $this->fp = fopen($this->fname, "wb");
        fwrite($this->fp, "TEST1\n");
        fwrite($this->fp, "TEST2\n");
        flush();
        fclose($this->fp);
    }

    private function deleteTestFile() {
        if(file_exists($this->fname)) {
            unlink($this->fname);
        }

    }
}

答案 3 :(得分:1)

您的脚本可能需要很长时间才能终止。

你应该在php.ini中查找max_execution_time指令并将其设置为适合你的东西。

默认的max_execution_time设置为30秒,因此您的脚本可能会被终止。

如果您还需要及时限制脚本,可以通过调用set_time_init();

单独执行此操作。

答案 4 :(得分:1)

您是否尝试过使用bash / shell(如果您使用的是Linux)将csv导入mysql?你也可以使用ruby或perl或诸如此类的东西,因为我认为你应该使用它而不是php(或任何web应用程序)来导入文件。

答案 5 :(得分:1)

我建议使用快速MySQL LOAD DATA INFILE命令:

http://dev.mysql.com/doc/refman/5.1/en/load-data.html

如果这不是一个选项,您可以拆分CSV文件(假设访问shell)。

答案 6 :(得分:0)

呸!忽略这个答案。是重复的。请参阅Scorchio上面提到的fgetcsv()。