php在关键字后使用stream_get_line拆分

时间:2013-12-10 15:48:37

标签: php file

我有一个看起来像这样的文件:

   # Hand 1: Tournament
   Lots of placeholder text
   Lots of placeholder text
   Lots of placeholder text

   # Hand 2: Knockout
   Lots of placeholder text
   Lots of placeholder text
   Lots of placeholder text


   # Hand 3: Knockout
   Lots of placeholder text
   Lots of placeholder text
   Lots of placeholder text


   # Hand 4: Tournament
   Lots of placeholder text
   Lots of placeholder text
   Lots of placeholder text


   # Hand 5: Tournament
   Lots of placeholder text
   Lots of placeholder text
   Lots of placeholder text

该文件是400mb,我想把它拆分成大约400kb左右的文件,我不确定是否有一种方法可以用kb分割,所以我决定只是按行进行,因为它不需要非常准确只要它在400-500kb之间。

现在我不希望它在一个块的中间分割,所以当它达到100行时,我希望它在#Hand部分之前分割,所以我最终不会得到一个不完整的块。

香港专业教育学院尝试使用此代码,但我得到一个错误,我无法找到很多在线文档来比较我的。任何想法都是错的。

目标的最终结果是在x行之后拆分并创建大量的400kb文档而不是400mb文档。

<?php
$filename = fopen('files/text1.txt','r');
$keyword = 'Hand #';

$contents = stream_get_line($filename,100,$keyword);

?><pre><?php
print_r($contents);
?></pre><?php

?>

我得到的错误是: PHP警告:stream_get_line()期望参数1为资源

1 个答案:

答案 0 :(得分:0)

我制作了自己的递归代码片段,其中包含大量测试用例,以便在本主题的最后获得最佳结果。我希望评论告诉你应该告诉你的事情。

<?php
/* Snippet by Revo */
$file = file('your-file.txt'); // files content goes to an array line by line
$count = count($file); // number of all lines
$length = 500; // the least line to split (here on every 500 lines, but it may be splitted on lines 501, 502, 503, ... based on where is the next boundary)
$boundary = "# Hand"; // boundary word

/**
 * [split_file split the whole txt file]
 * @param  [int] $old_length     [length to begin job, default is 0]
 * @param  [int] $current_length [length to end job]
 * @param  [string] $boundary       [group of characters that we check on every call]
 * @param  [int] $filename       [that's for having autamated file name in numbers]
 * @return [true]                 [always returns true]
 */
function split_file($old_length, $current_length, $boundary, $filename)
{
    global $count, $file, $length;
    if(@strpos($file[$current_length], $boundary) === false) // check if at current length we can end it up or not
    {
        if($current_length < $count) // check if current length is lower than total
        {
            if($length <= $count - $current_length)
            {
                $split = array_slice($file, $current_length, $count); // split the lines to check for nearest line that has our word boudary
                foreach ($split as $key => $value)
                {
                    if(strpos($value, $boundary) !== false)
                    {
                        $new_length = $current_length + $key; // change the new length for next split
                        $length += $key;
                        break;
                    }
                }
            }
            else
            {
                $new_length = $count;
                $length = $count - $current_length;
            }
        }
        else if($current_length == $count)
        {
            // this is the block that will run on the last call
            $new_length = $count;
            $length = $current_length - $old_length;
        }
    }
    else
        $new_length = $current_length;

    $split = array_slice($file, $old_length, $length);
    $old_length = $new_length; // change the old length
    $next_length = $new_length + $length; // change the next length
    file_put_contents($filename.".txt", print_r($split, true)); // put content on a new file
    if($new_length == $count)
    {
        // if we are on the last job don't continue anymore
        return true;
    }
    if($next_length > $count) // for that last call check if length will goes higher than total
    {
        $next_length = $count;
    }
    split_file($old_length, $next_length, $boundary, ++$filename);
}

# Here we go
split_file($old_length = 0, $length, $boundary, 0);
?>

如果.txt文件包含以下内容:

  # Hand 0: Tournament
   Lots of placeholder text
   Lots of placeholder text
   Lots of placeholder text

   # Hand 2: Knockout
   Lots of placeholder text
   Lots of placeholder text
   Lots of placeholder text


   # Hand 3: Knockout
   Lots of placeholder text
   Lots of placeholder text
   Lots of placeholder text


   # Hand 4: Tournament
   Lots of placeholder text
   Lots of placeholder text
   Lots of placeholder text


   # Hand 5: Tournament
   Lots of placeholder text
   Lots of placeholder text
   Lots of placeholder text

以下设置的输出存储在6 .txt文件中,如图所示。

$length = 2; // Or 3,4,5 [without changing in result]
$boundary = "# Hand";

<强>输出:

0.txt

  # Hand 0: Tournament
   Lots of placeholder text
   Lots of placeholder text
   Lots of placeholder text

的1.txt

  # Hand 1: Tournament
   Lots of placeholder text
   Lots of placeholder text
   Lots of placeholder text

2.txt

   # Hand 2: Knockout
   Lots of placeholder text
   Lots of placeholder text
   Lots of placeholder text

3.txt

   # Hand 3: Knockout
   Lots of placeholder text
   Lots of placeholder text
   Lots of placeholder text

4.txt

   # Hand 4: Tournament
   Lots of placeholder text
   Lots of placeholder text
   Lots of placeholder text

5.txt

   # Hand 5: Tournament
   Lots of placeholder text
   Lots of placeholder text
   Lots of placeholder text