Question

文本文件中包含的数据（实际上是.dat）如下所示：

LIN*1234*UP*abcde*33*0*EA
LIN*5678*UP*fghij*33*0*EA
LIN*9101*UP*klmno*33*23*EA

文件中实际上有超过500,000条这样的行。

这就是我现在使用的：

//retrieve file once        
$file = file_get_contents('/data.dat'); 
$file = explode('LIN', $file);

    ...some code

foreach ($list as $item) { //an array containing 10 items
     foreach($file as $line) { //checking if these items are on huge list
         $info = explode('*', $line);
         if ($line[3] == $item[0]) {
             ...do stuff...                     
             break; //stop checking if found
          }
      }         
 }

问题是它运行得太慢 - 每次迭代大约1.5秒。我分别确认这不是......做的事情......＆＃39;这会影响速度。相反，它搜索正确的项目。

我怎样才能加快速度？谢谢。

Answer 1

如果每个项目都在自己的行上，而不是将整个内容加载到内存中，那么最好使用fgets()代替：

$f = fopen('text.txt', 'rt');

while (!feof($f)) {
    $line = rtrim(fgets($f), "\r\n");
    $info = explode('*', $line);
    // etc.
}

fclose($f);

PHP文件流被缓冲（~8kB），因此在性能方面应该是不错的。

另一条逻辑可以像这样重写（而不是多次迭代文件）：

if (in_array($info[3], $items)) // look up $info[3] inside the array of 10 things

或者，如果$items被适当地编入索引：

if (isset($items[$info[3]])) { ... }

Answer 2

执行file_get_contents时，它会将内容加载到内存中，因此您只能想象该过程可能会占用多少资源。更不用说你有一个嵌套循环，即(O)n^2

如果可能，您可以拆分文件，也可以使用fopen，fgets和fclose逐行阅读。

如果我是你，如果我真的需要速度，我会使用C++或Go之类的其他语言。

Answer 3

file_get_contents将整个文件作为数组加载到内存中。然后你的代码就会对它起作用。从the official PHP fgets documentation调整此示例代码应该会更好：

$handle = @fopen("test.txt", "r");
if ($handle) {
    while (($buffer = fgets($handle, 4096)) !== false) {
        $file_data = explode('LIN', $buffer);
        foreach($file_data as $line) {
            $info = explode('*', $line);
            $info = array_filter($info);
            if (!empty($info)) {
                echo '<pre>';
                print_r($info);
                echo '</pre>';
            }
        }         
    }
    if (!feof($handle)) {
        echo "Error: unexpected fgets() fail\n";
    }
    fclose($handle);
}

使用您的数据输出上述代码是：

Array
(
    [1] => 1234
    [2] => UP
    [3] => abcde
    [4] => 33
    [6] => EA

)
Array
(
    [1] => 5678
    [2] => UP
    [3] => fghij
    [4] => 33
    [6] => EA

)
Array
(
    [1] => 9101
    [2] => UP
    [3] => klmno
    [4] => 33
    [5] => 23
    [6] => EA
)

但是仍然不清楚你所遗漏的代码，因为该行说明：

foreach ($list as $item) { //an array containing 10 items

这似乎是另一个真正的瓶颈。

PHP循环遍历庞大的文本文件非常慢，你能改进吗？

3 个答案: