我正在尝试使用PHP创建一个Sentiment分类器,我有一个包含1020386记录的bigrams文件,当我加载文件时没关系,当我对它执行操作时我得到允许的内存大小为134217728字节筋疲力尽,我试图一次对1000条记录执行操作,但问题仍然存在,我使用的是codeigniter和文件助手类。
$reader = new FilePrep();
$content = $reader->read(base_url().'Assets/files/w2_.txt');
$delimited = explode(PHP_EOL, $content);
$ngrams = array();
for($from = 0; $to = sizeof($delimited) ; $from+=1000){
$new = array_slice($ngrams, $from, 1000);
foreach($new as $ngram){
$del = explode(' ', $ngram);
array_push($ngrams, array($del[0],$del[1].' '.$del[2]));
}
}
print_r($ngrams);
FilePrep.php
public function read($path){
$handle = fopen($path,'r');
$string = stream_get_contents($handle);
return $string;
}
提前致谢
答案 0 :(得分:1)
我没有在代码中发现一些问题。你有什么问题。没有退出,因此PHP无法停止工作,并且您会收到内存错误。第二件事是你正在尝试将整个文件加载到变量中,然后将它分成int数组。更好的方法是逐行读取文件然后分配数据。最后一件事是你不能将1M行分配到二维数组变量中。每次你为PHP做的array_push内存都缩减了大约650次(基于你的w2_.txt文件示例。
请查看给定的代码。在向数组添加数据时,您可以看到PHP如何使用内存:
$handle = fopen('w2_.txt', 'r');
$ngrams = array();
$i=0;
echo $i . ': ' . memory_get_usage() . "\t";
$current_memory_usage = memory_get_usage();
while (($line = fgets($handle, 8192)) !== false) {
$i++;
echo "File line # $i \t";
$del = explode("\t", $line);
array_push($ngrams, array($del[0],$del[1].' '.$del[2]));
echo "[ rised: " . (memory_get_usage() - $current_memory_usage) . "\t total: " . memory_get_usage() . "]\n";
$current_memory_usage = memory_get_usage();
}
fclose($handle);
给出输出:
FILE MEMORY RISED TOTAL MEMORY USED
File line # 1 [ rised: 9984 total: 254248]
File line # 2 [ rised: 616 total: 254864]
File line # 3 [ rised: 648 total: 255512]
File line # 4 [ rised: 632 total: 256144]
File line # 5 [ rised: 640 total: 256784]
File line # 6 [ rised: 640 total: 257424]
File line # 7 [ rised: 640 total: 258064]
File line # 8 [ rised: 640 total: 258704]
File line # 9 [ rised: 704 total: 259408]
File line # 10 [ rised: 656 total: 260064]
File line # 11 [ rised: 624 total: 260688]
File line # 12 [ rised: 640 total: 261328]
File line # 13 [ rised: 640 total: 261968]
File line # 14 [ rised: 640 total: 262608]
File line # 15 [ rised: 640 total: 263248]
File line # 16 [ rised: 640 total: 263888]
File line # 17 [ rised: 768 total: 264656]
File line # 18 [ rised: 640 total: 265296]
File line # 19 [ rised: 640 total: 265936]
File line # 20 [ rised: 640 total: 266576]
File line # 21 [ rised: 640 total: 267216]
File line # 22 [ rised: 640 total: 267856]
...
不确定它有多大帮助。
1)尝试替换:
foreach($new as $ngram)
与
foreach($new as &$ngram)
Foreach迭代变量的副本。如果您将其设置为参考'& ngram'通过操作相同的变量来节省内存。
2)如果没有使用变量 - 清除它。
$del = null
3)您可以添加到源代码:
echo memory_get_usage() . "\n";
这样你就能看到内存消耗过多的地方。
祝你好运!