Question

我有一个必须读取的1Gb文件1。我选择php来过滤和更改一些行，而不是使用这些更改创建另一个文件2。代码很好。如果我读50M文件就可以了。该代码按预期生成所有更改的file2。但是当我尝试运行1Gb文件时，没有创建file2，我从浏览器收到一条错误消息：

from mechanize import Browser
br = Browser()
br.open("http://web3.dgpa.gov.tw/WANT03FRONT/AP/WANTF00001.aspx?mode=PC")

# walk through each input control in the form:
for control in br.select_form(nr=0):
    # find the intended <select>
    if control.type=="select" and control.name=="ctl00$ContentPlaceHolder1$drpWORK_PLACE":
        # the value must be a list. because select is a list.
        control.value = ["82"]
        for item in control.items:
            if item.value == "82":
                item.selected = True
            else:
                item.selected = False

如果我返回并运行小文件，它再次运行良好。我已经将php memeory设置为2040M The connection to localhost was interrupted. Check your Internet connection Check any cables and reboot any routers, modems, or other network devices you may be using. Allow Chrome to access the network in your firewall or antivirus settings. If it is already listed as a program allowed to access the network, try removing it from the list and adding it again. If you use a proxy server... Check your proxy settings or contact your network administrator to make sure the proxy server is working. If you don't believe you should be using a proxy server: Go to the Chrome menu > Settings > Show advanced settings... > Change proxy settings... > LAN Settings and deselect "Use a proxy server for your LAN".，但我不知道它是否足够或有可能。

那么，对于这个问题应该如何方便？

注意：服务器是apache，win7，i7 8cores 64bits，16G RAM。我认为代码并不重要，但有人要求查看它：

ini_set('memory_limit', '2048M')

Answer 1

不要构建你要编写的整个文件内容（这需要大量内存），而是考虑编写一个stream filter。

流过滤器对来自底层流的单个缓冲读取进行操作，通常大约为8kB的数据。以下示例代码定义了这样一个过滤器，它将每个桶拆分为单独的行，并调用您的代码对其进行更改。

<?php

class myfilter extends \php_user_filter
{
  private $buffer; // internal buffer to create data buckets with

  private $pattern = ['/di_site/'];
  private $replace = ['au_site'];

  function filter($in, $out, &$consumed, $closing)
  {
    while ($bucket = stream_bucket_make_writeable($in)) {
      $parts = preg_split('/(\n|\r\n)/', $bucket->data, -1, PREG_SPLIT_DELIM_CAPTURE);
      $buffer = '';
      // each line spans two array elements
      for ($i = 0, $n = count($parts); $i + 1 < $n; $i += 2) {
        $line = $parts[$i] . $parts[$i + 1];
        $buffer .= $this->treat_line($line);
        $consumed += strlen($line);
      }
      stream_bucket_append($out, stream_bucket_new($this->stream, $buffer));
    }
    return PSFS_PASS_ON;
  }

  /** THIS IS YOUR CODE **/
  function treat_line($line)
  {
    $line = trim($line, "\t\n\r\0\x0B");
    $firstChar =  substr($line, 0,1) ;
    if (ord($firstChar)<>45) {
      if (preg_match("/di_site/", $line)) {
        $line = preg_replace($this->pattern, $this->replace, $line);
      }
    }
    return $line . "\n";
  }

  function onCreate()
  {
    $this->buffer = fopen('php://memory', 'r+');
  }

  function onClose()
  {
    fclose($this->buffer);
  }
}

stream_filter_register("myfilter", "myfilter");

// open input and attach filter
$in = fopen(__FILE__, 'r');
stream_filter_prepend($in, 'myfilter');
// open output stream and start copying
$out = fopen('php://stdout', 'w');
stream_copy_to_stream($in, $out);

fclose($out);
fclose($in);

什么应该是PHP读取1Gb文件的方便的内存限制？

1 个答案: