Question

我正在尝试减少一些文本文件中的编辑时间，其中大约有10,000行文本，但我只需要大约200或一些。

文本文件依赖于一个几乎特定的模式，但它不时偏离但我的“焦点”是为了选择要保留的正确行，该行始终以： z3455 开头然后有一个变量，例如：z3455 http://url.com/data1/data1.1/data1.3/（342kb）

我有一个算法来捕获网址及其内容，但现在我需要一些方法来遍历文本文件，删除所有除之外的行 z3455 然后将它们“推”到一起，使它们列在彼此之下。

我在PHP中尝试了不同的方法，但似乎无法找到正确的功能。我可以“隔离”特定的行号，但是当它偏离时，我无法完全使用这种方法。

我希望有人可以帮助我，无论是提供代码还是让我朝着正确的方向努力解决这个问题。

提前致谢

此致
- 梅斯蒂卡

Answer 1

$in = fopen('file.txt', 'rb')
$out = fopen('filtered.txt', 'wb+')
while($line = fread($in)) {
    if (preg_match('/^z3455 http.*$/', $line)) {
         fwrite($out, $line);
    }
}

当然，如果你是从命令行运行它，你也可以跳过PHP并使用grep，效率会更高：

$ grep '^z3455 http' file.txt > filtered.txt

Answer 2

这应该可以解决问题。 substr应该比正则表达式快，特别是在一个大文件上。

foreach (file($file) as $line) {
    if (substr($line, 0, 5) != 'z3455') {
        continue;
    }
    // $line is now a line of text that starts with 'z3455.
    // Do with it whatever you need. If you want whatever comes
    // after z3455, you could then do $line = substr($line, 5);
}

Answer 3

只是我头脑中的第一个想法（非常基本和未经测试）：

<?php
$filename = 'foo.txt';
$file = file($filename);
$matchedLines = array();

foreach($file as $line) {
  if(preg_match('/^z3455/', $line)) {
    $matchesLines[] = $line;
  }
}
?>

PHP：循环遍历文本文件并隔离特定“起点”的行

3 个答案: