(对于长期问题提前抱歉 - 问题实际上很简单 - 但要解释它可能不是那么简单)
我在PHP中的诺言技巧受到以下挑战:
输入2个TXT文件,其结构如下:
$rowidentifier //number,letter,string etc..
$some semi-fixed-string $somedelimiter $semi-fixed-string
$content //with unknown length or strings or lines number.
阅读上面的内容,我的意思是“半固定字符串,意味着它是一个具有KNOWN结构的字符串,但是UNKNOWN内容..
举一个实际的例子,让我们拿一个SRT文件(我只是用它作为豚鼠,因为结构与我需要的非常相似):1
00:00:12,759 --> 00:00:17,458
"some content here "
that continues here
2
00:00:18,298 --> 00:00:20,926
here we go again...
3
00:00:21,368 --> 00:00:24,565
...and this can go forever...
4
.
.
.
我想要做的是从一个文件中取出$ content部分,并将其放在第二个文件的正确位置。
回到示例SRT,有:
//file1
1
00:00:12,759 --> 00:00:17,458
"this is the italian content "
which continues in italian here
2
00:00:18,298 --> 00:00:20,926
here we go talking italian again ...
和
//file2
1
00:00:12,756 --> 00:00:17,433
"this is the spanish, chinese, or any content "
which continues in spanish, or chinese here
2
00:00:16,293 --> 00:00:20,96
here we go talking spanish, chinese or german again ...
将导致
//file3
1
00:00:12,756 --> 00:00:17,433
"this is the italian content "
which continues in italian here
"this is the spanish, chinese, or any content "
which continues in spanish, or chinese here
2
00:00:16,293 --> 00:00:20,96
here we go talking italian again ...
here we go talking spanish, chinese or german again ...
或更多php喜欢:
$rowidentifier //unchanged
$some semi-fixed-string $somedelimiter $semi-fixed-string //unchanged, except maybe an option to choose if to keep file1 or file2 ...
$content //from file 1
$content //from file 2
所以,在所有这些介绍之后 - 这就是我所拥有的(实际上没有任何东西......)
$first_file = file('file1.txt'); // no need to comment right ?
$second_file = file('file2.txt'); // see above comment
$result_array = array(); /construct array
foreach($first_file as $key=>$value) //loop array and....
$result_array[]= trim($value).'/r'.trim($second_file[$key]); //..here is my problem ...
// $Value is $content - but LINE BY LINE , and in our case, it could be 2-3- or even 4 lines
// should i go by delimiters /n/r ?? (not a good idea - how can i know they are there ?? )
// or should i go for regex to lookup for string patterns ? that is insane , no ?
$fp = fopen('merge.txt', 'w+'); fwrite($fp, join("\r\n", $result_array); fclose($fp);
这将逐行进行 - 这不是我需要的。我需要条件.. 另外 - 我确信这不是一个聪明的代码,或者有更好的方法可以实现它 - 所以任何帮助都会受到赞赏......
答案 0 :(得分:3)
您实际想要做的是并行迭代这两个文件,然后合并属于彼此的部分。
但你不能使用行号,因为那些可能不同。所以你需要使用条目号(块)。所以你需要给它一个“数字”或更精确,以便从文件中逐个输出一个条目。
因此,您需要一个能够将某些行转换为块的数据的迭代器。
所以而不是:
foreach($first_file as $number => $line)
它是
foreach($first_file_blocks as $number => $block)
这可以通过编写自己的迭代器来完成,该迭代器将文件的行作为输入,然后将行转换为块。为此你需要解析数据,这是一个基于状态的解析器的一个小例子,可以将行转换为块:
$state = 0;
$blocks = array();
foreach($lines as $line)
{
switch($state)
{
case 0:
unset($block);
$block = array();
$blocks[] = &$block;
$block['number'] = $line;
$state = 1;
break;
case 1:
$block['range'] = $line;
$state = 2;
break;
case 2:
$block['text'] = '';
$state = 3;
# fall-through intended
case 3:
if ($line === '') {
$state = 0;
break;
}
$block['text'] .= ($block['text'] ? "\n" : '') . $line;
break;
default:
throw new Exception(sprintf('Unhandled %d.', $state));
}
}
unset($block);
它只是沿着线条运行并改变它的状态。基于该状态,每一行都作为其块的一部分进行处理。如果新块开始,则将创建它。它适用于您在问题中概述的SRT文件demo。
为了更灵活地使用它,将其转换为迭代器,它在构造函数中使用$lines
并在迭代时提供块。这需要一些很少的采用,解析器如何获取行,但它通常是相同的。
class SRTBlocks implements Iterator
{
private $lines;
private $current;
private $key;
public function __construct($lines)
{
if (is_array($lines))
{
$lines = new ArrayIterator($lines);
}
$this->lines = $lines;
}
public function rewind()
{
$this->lines->rewind();
$this->current = NULL;
$this->key = 0;
}
public function valid()
{
return $this->lines->valid();
}
public function current()
{
if (NULL !== $this->current)
{
return $this->current;
}
$state = 0;
$block = NULL;
while ($this->lines->valid() && $line = $this->lines->current())
{
switch($state)
{
case 0:
$block = array();
$block['number'] = $line;
$state = 1;
break;
case 1:
$block['range'] = $line;
$state = 2;
break;
case 2:
$block['text'] = '';
$state = 3;
# fall-through intended
case 3:
if ($line === '') {
$state = 0;
break 2;
}
$block['text'] .= ($block['text'] ? "\n" : '') . $line;
break;
default:
throw new Exception(sprintf('Unhandled %d.', $state));
}
$this->lines->next();
}
if (NULL === $block)
{
throw new Exception('Parser invalid (empty).');
}
$this->current = $block;
$this->key++;
return $block;
}
public function key()
{
return $this->key;
}
public function next()
{
$this->lines->next();
$this->current = NULL;
}
}
基本用法如下,输出可以在Demo:
中看到$blocks = new SRTBlocks($lines);
foreach($blocks as $index => $block)
{
printf("Block #%d:\n", $index);
print_r($block);
}
所以现在可以迭代SRT文件中的所有块。现在唯一剩下的就是并行迭代两个SRT文件。从PHP 5.3开始,SPL附带MultipleIterator
来执行此操作。它现在非常简单,例如我使用相同的两次:
$multi = new MultipleIterator();
$multi->attachIterator(new SRTBlocks($lines));
$multi->attachIterator(new SRTBlocks($lines));
foreach($multi as $blockPair)
{
list($block1, $block2) = $blockPair;
echo $block1['number'], "\n", $block1['range'], "\n",
$block1['text'], "\n", $block2['text'], "\n\n";
}
将字符串(而不是输出)存储到文件中相当简单,所以我将其排除在答案之外。
那么要说什么?首先,可以在循环和某些状态中轻松地解析文件中的行等顺序数据。这不仅适用于文件中的行,也适用于字符串。
其次,为什么我在这里建议一个迭代器?首先,它易于使用。从并行处理一个文件到两个文件只需要一小步。接下来,迭代器实际上也可以在另一个迭代器上运行。例如,使用SPLFileObject
类。它为文件中的所有行提供了一个迭代器。如果你有大文件,你可以使用SPLFileObject
(而不是一个数组),在添加SRTBlocks
一小部分删除尾随EOL字符后,你不需要先将这两个文件加载到数组中从每一行的结尾:
$line = rtrim($line, "\n\r");
它只是起作用:
$multi = new MultipleIterator();
$multi->attachIterator(new SRTBlocks(new SplFileObject($file1)));
$multi->attachIterator(new SRTBlocks(new SplFileObject($file2)));
foreach($multi as $blockPair)
{
list($block1, $block2) = $blockPair;
echo $block1['number'], "\n", $block1['range'], "\n",
$block1['text'], "\n", $block2['text'], "\n\n";
}
完成后你甚至可以使用(几乎)相同的代码处理非常大的文件。灵活,不是吗? The full demonstration