我有一个看起来像这样的文件:
# Hand 1: Tournament
Lots of placeholder text
Lots of placeholder text
Lots of placeholder text
# Hand 2: Knockout
Lots of placeholder text
Lots of placeholder text
Lots of placeholder text
# Hand 3: Knockout
Lots of placeholder text
Lots of placeholder text
Lots of placeholder text
# Hand 4: Tournament
Lots of placeholder text
Lots of placeholder text
Lots of placeholder text
# Hand 5: Tournament
Lots of placeholder text
Lots of placeholder text
Lots of placeholder text
该文件是400mb,我想把它拆分成大约400kb左右的文件,我不确定是否有一种方法可以用kb分割,所以我决定只是按行进行,因为它不需要非常准确只要它在400-500kb之间。
现在我不希望它在一个块的中间分割,所以当它达到100行时,我希望它在#Hand部分之前分割,所以我最终不会得到一个不完整的块。
香港专业教育学院尝试使用此代码,但我得到一个错误,我无法找到很多在线文档来比较我的。任何想法都是错的。
目标的最终结果是在x行之后拆分并创建大量的400kb文档而不是400mb文档。
<?php
$filename = fopen('files/text1.txt','r');
$keyword = 'Hand #';
$contents = stream_get_line($filename,100,$keyword);
?><pre><?php
print_r($contents);
?></pre><?php
?>
我得到的错误是: PHP警告:stream_get_line()期望参数1为资源
答案 0 :(得分:0)
我制作了自己的递归代码片段,其中包含大量测试用例,以便在本主题的最后获得最佳结果。我希望评论告诉你应该告诉你的事情。
<?php
/* Snippet by Revo */
$file = file('your-file.txt'); // files content goes to an array line by line
$count = count($file); // number of all lines
$length = 500; // the least line to split (here on every 500 lines, but it may be splitted on lines 501, 502, 503, ... based on where is the next boundary)
$boundary = "# Hand"; // boundary word
/**
* [split_file split the whole txt file]
* @param [int] $old_length [length to begin job, default is 0]
* @param [int] $current_length [length to end job]
* @param [string] $boundary [group of characters that we check on every call]
* @param [int] $filename [that's for having autamated file name in numbers]
* @return [true] [always returns true]
*/
function split_file($old_length, $current_length, $boundary, $filename)
{
global $count, $file, $length;
if(@strpos($file[$current_length], $boundary) === false) // check if at current length we can end it up or not
{
if($current_length < $count) // check if current length is lower than total
{
if($length <= $count - $current_length)
{
$split = array_slice($file, $current_length, $count); // split the lines to check for nearest line that has our word boudary
foreach ($split as $key => $value)
{
if(strpos($value, $boundary) !== false)
{
$new_length = $current_length + $key; // change the new length for next split
$length += $key;
break;
}
}
}
else
{
$new_length = $count;
$length = $count - $current_length;
}
}
else if($current_length == $count)
{
// this is the block that will run on the last call
$new_length = $count;
$length = $current_length - $old_length;
}
}
else
$new_length = $current_length;
$split = array_slice($file, $old_length, $length);
$old_length = $new_length; // change the old length
$next_length = $new_length + $length; // change the next length
file_put_contents($filename.".txt", print_r($split, true)); // put content on a new file
if($new_length == $count)
{
// if we are on the last job don't continue anymore
return true;
}
if($next_length > $count) // for that last call check if length will goes higher than total
{
$next_length = $count;
}
split_file($old_length, $next_length, $boundary, ++$filename);
}
# Here we go
split_file($old_length = 0, $length, $boundary, 0);
?>
如果.txt
文件包含以下内容:
# Hand 0: Tournament
Lots of placeholder text
Lots of placeholder text
Lots of placeholder text
# Hand 2: Knockout
Lots of placeholder text
Lots of placeholder text
Lots of placeholder text
# Hand 3: Knockout
Lots of placeholder text
Lots of placeholder text
Lots of placeholder text
# Hand 4: Tournament
Lots of placeholder text
Lots of placeholder text
Lots of placeholder text
# Hand 5: Tournament
Lots of placeholder text
Lots of placeholder text
Lots of placeholder text
以下设置的输出存储在6 .txt
文件中,如图所示。
$length = 2; // Or 3,4,5 [without changing in result]
$boundary = "# Hand";
<强>输出:强>
的 0.txt 强> 的
# Hand 0: Tournament
Lots of placeholder text
Lots of placeholder text
Lots of placeholder text
的的1.txt 强> 的
# Hand 1: Tournament
Lots of placeholder text
Lots of placeholder text
Lots of placeholder text
的 2.txt 强> 的
# Hand 2: Knockout
Lots of placeholder text
Lots of placeholder text
Lots of placeholder text
的 3.txt 强> 的
# Hand 3: Knockout
Lots of placeholder text
Lots of placeholder text
Lots of placeholder text
的 4.txt 强> 的
# Hand 4: Tournament
Lots of placeholder text
Lots of placeholder text
Lots of placeholder text
的 5.txt 强> 的
# Hand 5: Tournament
Lots of placeholder text
Lots of placeholder text
Lots of placeholder text