我正在尝试仅使用fopen()
和fseek()
来获取特定的代码行(不仅是一行,我需要获取当前搜索行的上方和下方的行)。
为了提高性能,我知道如何获取特定的行以进行搜索然后退出。如果我需要第5行,那么应该可以搜索到4和6。
这是一个代码,用于获取每行的字节,然后将其作为键作为行放入数组,将值作为字节放入EOF
。
$fh = fopen($source, 'r');
$meta = stream_get_meta_data($fh);
if (!$meta['seekable']) {
throw new Exception(sprintf("A source is not seekable: %s", print_r($source, true)));
}
$line = fgets($fh, 4096);
$pos = -1;
$i = 0;
$result = null;
$linenum = 10;
var_dump('Line num:'.$linenum);
$total_lines = null;
// Get seek byte end of each line
while (!feof($fh)) {
$char = fgetc($fh);
if ($char != "\n" && $char != "\r") {
$total_lines[$i] = $pos;
$pos++;
} else {
$i++;
}
//var_dump(fgets($fh).' _ '.$pos);
}
// Now get specific lines (line 5, line 6 and line 7)
$seekssearch = array($total_lines[5], $total_lines[6], $total_lines[7]);
$result = null;
$posr = 0;
foreach ($seekssearch as $sk) {
while (!feof($fh)) {
if ($char != "\n" && $char != "\r") {
fseek($fh, $sk, SEEK_SET);
$posr++;
} else {
$ir++;
}
}
// Merge result of line 5,6 and 7
$result .= fgets($fh);
}
echo $result;
exit;
while (!feof($fh) && $i<($linenum)) {
$char = fgetc($fh);
if ($char != "\n" && $char != "\r") {
fseek($fh, $pos, SEEK_SET);
$pos++;
}
else {
$i++;
}
}
$line = trim(fgets($fh));
var_dump($line);
exit;
exit;
while (!feof($fh) && $i<($linenum-1)) {
$char = fgetc($fh);
if ($char != "\n" && $char != "\r") {
//fseek($fh, $pos);
fseek($fh, $pos);
$pos++;
}
else {
if ($pos == 3) {
$line = fgets($fh);
}
$i++;
}
}
//$line = fgets($fh);
var_dump($line); exit;
如何合并这些行?
注意:我不想使用
splFileInfo
或任何技巧,如数组。只是想寻求然后退出。
答案 0 :(得分:0)
我创建了一个函数,它读取文件并计算行数,并将每行要存储的数据存储到数组中。如果设置了linenum
指定的最大值,那么它将保持性能而不是新循环函数,以寻找以字节为单位的位置来获取文件内容。
我相信这个功能可以改进。
function readFileSeek($source, $linenum = 0, $range = 0)
{
$fh = fopen($source, 'r');
$meta = stream_get_meta_data($fh);
if (!$meta['seekable']) {
throw new Exception(sprintf("A source is not seekable: %s", print_r($source, true)));
}
$pos = 2;
$result = null;
if ($linenum) {
$minline = $linenum - $range - 1;
$maxline = $minline+$range+$range;
}
$totalLines = 0;
while (!feof($fh)) {
$char = fgetc($fh);
if ($char == "\n" || $char == "\r") {
++$totalLines;
} else {
$result[$totalLines] = $pos;
}
$pos++;
if ($maxline+1 == $totalLines) {
// break from while to not read entire file
break;
}
}
$buffer = '';
for ($nr=$minline; $nr<=$maxline; $nr++) {
if (isset($result[$nr])) {
fseek($fh, $result[$nr], SEEK_SET);
while (!feof($fh)) {
$char = fgetc($fh);
if ($char == "\n" || $char == "\r") {
$buffer .= $char;
break;
} else {
$buffer .= $char;
}
}
}
}
return $buffer;
}
测试结果(1.3 GB文件,1亿行代码,寻求300000行代码):
string(55) "299998_abc
299999_abc
300000_abc
300001_abc
300002_abc
"
Time: 612 ms, Memory: 20.00Mb
$ ll -h /tmp/testReadSourceLines_27151460344/41340913936
-rw-rw-r-- 1 1,3G /tmp/testReadSourceLines_27151460344/41340913936