PHP使用fseek获取多个特定行

时间:2016-06-24 07:59:59

标签: php fopen fseek fgetc

我正在尝试仅使用fopen()fseek()来获取特定的代码行(不仅是一行,我需要获取当前搜索行的上方和下方的行)。

为了提高性能,我知道如何获取特定的行以进行搜索然后退出。如果我需要第5行,那么应该可以搜索到4和6。

这是一个代码,用于获取每行的字节,然后将其作为键作为行放入数组,将值作为字节放入EOF

$fh = fopen($source, 'r');
$meta = stream_get_meta_data($fh);

if (!$meta['seekable']) {
    throw new Exception(sprintf("A source is not seekable: %s", print_r($source, true)));
}

$line = fgets($fh, 4096);
$pos = -1;
$i = 0;

$result = null;

$linenum = 10;
var_dump('Line num:'.$linenum);

$total_lines = null;

// Get seek byte end of each line
while (!feof($fh)) {
    $char = fgetc($fh);

    if ($char != "\n" && $char != "\r") {        
        $total_lines[$i] = $pos;

        $pos++;
    } else {
        $i++;
    }    
    //var_dump(fgets($fh).' _ '.$pos);
}

// Now get specific lines (line 5, line 6 and line 7)
$seekssearch = array($total_lines[5], $total_lines[6], $total_lines[7]);

$result = null;
$posr = 0;
foreach ($seekssearch as $sk) {

    while (!feof($fh)) {

        if ($char != "\n" && $char != "\r") {

        fseek($fh, $sk, SEEK_SET);

        $posr++;

        } else {
        $ir++;


        }
    }

    // Merge result of line 5,6 and 7
    $result .= fgets($fh);    
}

echo $result;






exit;


while (!feof($fh) && $i<($linenum)) {
            $char = fgetc($fh);

            if ($char != "\n" && $char != "\r") {
                fseek($fh, $pos, SEEK_SET);
                $pos++;

            }
            else {
                $i++;
            }
        }
        $line = trim(fgets($fh));

        var_dump($line);






exit;




exit;

while (!feof($fh) && $i<($linenum-1)) {
    $char = fgetc($fh);



    if ($char != "\n" && $char != "\r") {
        //fseek($fh, $pos);
        fseek($fh, $pos);
        $pos++;
    }
    else {

        if ($pos == 3) {

            $line = fgets($fh);
        }

        $i++;


    }
}

//$line = fgets($fh);
var_dump($line); exit;

如何合并这些行?

  

注意:我不想使用splFileInfo或任何技巧,如数组。只是想寻求然后退出。

1 个答案:

答案 0 :(得分:0)

我创建了一个函数,它读取文件并计算行数,并将每行要存储的数据存储到数组中。如果设置了linenum指定的最大值,那么它将保持性能而不是新循环函数,以寻找以字节为单位的位置来获取文件内容。

我相信这个功能可以改进。

function readFileSeek($source, $linenum = 0, $range = 0)
{
    $fh = fopen($source, 'r');
    $meta = stream_get_meta_data($fh);

    if (!$meta['seekable']) {
        throw new Exception(sprintf("A source is not seekable: %s", print_r($source, true)));
    }

    $pos = 2;
    $result = null;

    if ($linenum) {
        $minline = $linenum - $range - 1;
        $maxline = $minline+$range+$range;
    }

    $totalLines = 0;
    while (!feof($fh)) {

        $char = fgetc($fh);

        if ($char == "\n" || $char == "\r") {
            ++$totalLines;
        } else {
            $result[$totalLines] = $pos;   
        }
        $pos++;

        if ($maxline+1 == $totalLines) {
            // break from while to not read entire file
            break;
        }
    }

    $buffer = '';

    for ($nr=$minline; $nr<=$maxline; $nr++) {

        if (isset($result[$nr])) {

            fseek($fh, $result[$nr], SEEK_SET);

            while (!feof($fh)) {
                $char = fgetc($fh);

                if ($char == "\n" || $char == "\r") {
                    $buffer .= $char;
                    break;
                } else {
                    $buffer .= $char;
                }
            }

        }
    }

    return $buffer;
}

测试结果(1.3 GB文件,1亿行代码,寻求300000行代码):

string(55) "299998_abc
299999_abc
300000_abc
300001_abc
300002_abc
"


Time: 612 ms, Memory: 20.00Mb

$ ll -h /tmp/testReadSourceLines_27151460344/41340913936
-rw-rw-r-- 1  1,3G /tmp/testReadSourceLines_27151460344/41340913936