Question

我需要帮助处理包含约46k行或30MB以上数据的文件。

我最初的想法是打开文件，并将每一行变成一个数组元素。这是第一次工作，因为该数组总共可存储约32k个值。第二次重复该过程，该数组仅容纳1011个元素，最后，第三次它只能容纳100个元素。

我很困惑，对后端数组过程了解不多。有人可以解释正在发生的事情并修复代码吗？

 function file_to_array($cvsFile){

      $handle = fopen($cvsFile, "r");
      $path = fread($handle, filesize($cvsFile));
      fclose($handle);

      //Turn the file into an array and separate lines to elements
      $csv = explode(",", $path);

      //Remove common double spaces
      foreach ($csv as $key => $line){
         $csv[$key] = str_replace(' ', '', str_getcsv($line));
      }
      array_filter($csv);

      //get the row count for the file and array
      $rows = count($csv);
      $filerows = count(file($cvsFile)); //this no longer works

      echo "File has $filerows and array has $rows";

      return $csv;
 }

Answer 1

这里的方法可以分为2。

优化的文件读取和处理
正确的存储解决方案

可以像这样完成优化的文件处理：

$handle = fopen($cvsFile, "r");
$rowsSucceed = 0;
$rowsFailed = 0;

if ($handle) {
    while (($line = fgets($handle)) !== false) { // Reading file by line
        // Process CSV line and check if it was parsed correctly
        // And count as you go
        if (!empty($parsedLine)) {
            $csv[$key] = ... ;
            $rowsSucceed++;
        } else {
            $rowsFailed++;
        }
    }

    fclose($handle);
} else {
    // Error handling
}

$totalLines = $rowsSucceed + $rowsFailed;

此外，您可以简单地通过不添加已处理的行（如果为空）来避免array_filter()。

它将允许在脚本执行期间优化内存使用。

正确的存储空间

这里需要适当的存储才能对一定数量的数据执行操作。文件读取是无效且昂贵的。使用简单的基于文件的数据库，例如sqlite可以为您提供很多帮助，并提高脚本的整体性能。为此，您可能应该直接将CSV处理到数据库中，然后对已解析的数据执行计数操作，以避免过多的文件行数等。此外，它还为您提供了处理数据而不将其全部保存在内存中的进一步优势。

Answer 2

您的问题是您想“将每一行变成一个数组元素”，但这绝对不是您要执行的操作。代码很清楚。它会将整个文件读取到$path中，然后使用explode()对每一行中的每个元素制作一个大型平面数组。然后，稍后您尝试在每个项目上运行str_getcsv()，这当然是行不通的。您已经把所有逗号都炸掉了。

使用fgetcsv()遍历文件更有意义：

function file_to_array($cvsFile) {
    $filerows = 0;
    $handle = fopen($cvsFile, "r");
    while ($line = fgetcsv($handle)) {
        $filerows++;
        // skip empty lines
        if ($line[0] === null) {
            continue;
        }
        //Remove common double spaces
        $csv[] = str_replace(' ', '', $line);
    }

    //get the row count for the file and array
    $rows = count($csv);
    echo "File has $filerows and array has $rows";
    fclose($handle);

    return $csv;
}

PHP数组处理能力下降

2 个答案:

可以像这样完成优化的文件处理：

正确的存储空间