我试图读取一个大文件(大约500万行),它一直达到内存限制。有没有办法可以将文件读取到特定行,然后递增计数器并从下一行继续?
以下是我正在使用的代码,如何添加指向fgets起始行的指针?
$handle = @fopen("large_file.txt", "r");
if($handle){
while(($buffer = fgets($handle, 4096)) !== false){
//get the content of the line
}
}
我没有尝试只阅读一条特定的线路,我试图从第1行到第10,000行读取,然后从第10,001行重新开始到另外10,000行,就像那样。
答案 0 :(得分:0)
我认为答案会在Reading specific line of a file in php
您可以使用搜索获取特定的行位置
$file = new SplFileObject('yourfile.txt');
$file->seek(123); // seek to line 124 (0-based)
答案 1 :(得分:0)
尝试使用此功能,您应该使用迭代并逐行获取。
$file = new SplFileObject('yourfile.txt');
echo getLineRange(1,10000);
function getLineRange($start,$end){
$tmp = "";
for ($i = $start; $i <= $end; $i++) {
$tmp .= $file->seek($i);
}
return($tmp);
}
答案 2 :(得分:0)
可以使用fseek()
/ ftell()
在PHP中批量处理块中的大文件,并在块之间保存上下文。 (SplFileObject::seek()
可以直接搜索,但似乎有performance issues with large files。)
假设您有某种批处理器可用,以下示例应该可以让您了解该方法。它未经测试,但源自生产中的代码。
<?php
$context = array(
'path' => 'path/to/file',
'limit' => 1000,
'line' => NULL,
'position' => NULL,
'size' => NULL,
'percentage' => 0,
'complete' => FALSE,
'error' => FALSE,
'message' => NULL,
);
function do_chunk($context) {
$handle = fopen($context['path'], 'r');
if (!$handle) {
$context['error'] = TRUE;
$context['message'] = 'Cannot open file for reading: ' . $context['path'];
return;
}
// One-time initialization of file parameters.
if (!isset($context['size'])) {
$fstat = fstat($handle);
$context['size'] = $fstat['size'];
$context['position'] = 0;
$context['line'] = 0;
}
// Seek to position for current chunk.
$ret = fseek($handle, $context['position']);
if ($ret === -1) {
$context['error'] = TRUE;
$context['message'] = 'Cannot seek to ' . $context['position'];
fclose($handle);
return;
}
$k = 1;
do {
$context['line']++;
$raw_line = fgets($handle);
if ($raw_line) {
// Strip newline.
$line = rtrim($raw_line);
// Code to process line here.
list($error, $message) = my_process_line($context, $line);
if ($error) {
$context['error'] = TRUE;
$context['message'] = $message;
fclose($handle);
return;
}
} elseif (($raw_line === FALSE) && !feof($handle)) {
$context['error'] = TRUE;
$context['message'] = 'Unexpected error reading ' . $context['path'];
fclose($handle);
return;
}
}
while ($k++ < $context['limit'] && $raw_line);
// Save position of next chunk.
$position = ftell($handle);
if ($position !== FALSE) {
$context['position'] = $position;
} else {
$context['error'] = TRUE;
$context['message'] = 'Cannot retrieve file pointer in ' . $context['path'];
fclose($handle);
return;
}
if (!$raw_line) {
$context['complete'] = TRUE;
$context['percentage'] = 1;
} else {
$context['percentage'] = $context['position'] / $context['size'];
}
fclose($handle);
}
// Batch driver for testing only - use a batch processor in production.
while ($context['complete']) {
do_batch($context);
}
if ($context['error']) {
print 'error: ' . $context['message'];
} else {
print 'complete';
}