在不使用ini_set('memory_limit',' - 1')的情况下处理php中的大量数据;

时间:2016-02-03 18:51:41

标签: php codeigniter

我正在尝试使用PHP创建一个Sentiment分类器,我有一个包含1020386记录的bigrams文件,当我加载文件时没关系,当我对它执行操作时我得到允许的内存大小为134217728字节筋疲力尽,我试图一次对1000条记录执行操作,但问题仍然存在,我使用的是codeigniter和文件助手类。

 $reader = new FilePrep();
 $content = $reader->read(base_url().'Assets/files/w2_.txt');
 $delimited = explode(PHP_EOL, $content);
 $ngrams = array();
 for($from = 0; $to = sizeof($delimited) ; $from+=1000){
        $new = array_slice($ngrams, $from, 1000);
        foreach($new as $ngram){
            $del = explode(' ', $ngram);
            array_push($ngrams, array($del[0],$del[1].' '.$del[2]));
        }
 }
 print_r($ngrams);

FilePrep.php

public function read($path){
    $handle = fopen($path,'r');
    $string = stream_get_contents($handle);
    return $string;
}

提前致谢

1 个答案:

答案 0 :(得分:1)

更新:

我没有在代码中发现一些问题。你有什么问题。没有退出,因此PHP无法停止工作,并且您会收到内存错误。第二件事是你正在尝试将整个文件加载到变量中,然后将它分成int数组。更好的方法是逐行读取文件然后分配数据。最后一件事是你不能将1M行分配到二维数组变量中。每次你为PHP做的array_push内存都缩减了大约650次(基于你的w2_.txt文件示例。

请查看给定的代码。在向数组添加数据时,您可以看到PHP如何使用内存:

实施例

$handle = fopen('w2_.txt', 'r');

$ngrams = array();
$i=0;
echo $i . ': ' . memory_get_usage() . "\t";

$current_memory_usage = memory_get_usage();
while (($line = fgets($handle, 8192)) !== false) {
    $i++;
    echo "File line # $i \t";
    $del = explode("\t", $line);
    array_push($ngrams, array($del[0],$del[1].' '.$del[2]));

    echo "[ rised: " . (memory_get_usage() - $current_memory_usage) . "\t total: " . memory_get_usage() . "]\n";
    $current_memory_usage = memory_get_usage();
}
fclose($handle); 

给出输出:

FILE              MEMORY RISED   TOTAL MEMORY USED
File line # 1   [ rised: 9984    total: 254248]
File line # 2   [ rised: 616     total: 254864]
File line # 3   [ rised: 648     total: 255512]
File line # 4   [ rised: 632     total: 256144]
File line # 5   [ rised: 640     total: 256784]
File line # 6   [ rised: 640     total: 257424]
File line # 7   [ rised: 640     total: 258064]
File line # 8   [ rised: 640     total: 258704]
File line # 9   [ rised: 704     total: 259408]
File line # 10  [ rised: 656     total: 260064]
File line # 11  [ rised: 624     total: 260688]
File line # 12  [ rised: 640     total: 261328]
File line # 13  [ rised: 640     total: 261968]
File line # 14  [ rised: 640     total: 262608]
File line # 15  [ rised: 640     total: 263248]
File line # 16  [ rised: 640     total: 263888]
File line # 17  [ rised: 768     total: 264656]
File line # 18  [ rised: 640     total: 265296]
File line # 19  [ rised: 640     total: 265936]
File line # 20  [ rised: 640     total: 266576]
File line # 21  [ rised: 640     total: 267216]
File line # 22  [ rised: 640     total: 267856]
...

OLD ANSWER:

不确定它有多大帮助。

1)尝试替换:

foreach($new as $ngram)

foreach($new as &$ngram)

Foreach迭代变量的副本。如果您将其设置为参考'& ngram'通过操作相同的变量来节省内存。

2)如果没有使用变量 - 清除它。

$del = null

3)您可以添加到源代码:

echo memory_get_usage() . "\n";

这样你就能看到内存消耗过多的地方。

祝你好运!