Question

所以我的应用程序导出了一个11.5 MB的CSV文件，并基本上使用了永远不会释放的所有RAM。

CSV数据是从数据库中获取的，在上述情况下，整个内容都将被导出。

我以以下方式使用Ruby 2.4.1标准CSV库：

export_helper.rb：

CSV.open('full_report.csv', 'wb', encoding: UTF-8) do |file|
  data = Model.scope1(param).scope2(param).includes(:model1, :model2)
  data.each do |item|
    file << [
      item.method1,
      item.method2,
      item.methid3
    ]
  end
  # repeat for other models - approx. 5 other similar loops
end

，然后在控制器中：

generator = ExportHelper::ReportGenerator.new
generator.full_report
respond_to do |format|
  format.csv do
    send_file(
      "#{Rails.root}/full_report.csv",
      filename: 'full_report.csv',
      type: :csv,
      disposition: :attachment
    )
  end
end

单个请求后，puma进程将加载整个服务器RAM的55％，并保持这种状态，直到最终完全耗尽内存。

例如，在this article中，生成一百万行75 MB CSV文件仅需要1 MB RAM。但是不涉及数据库查询。

服务器具有1015 MB RAM + 400 MB交换内存。

所以我的问题是：

究竟消耗了多少内存？是CSV生成还是与数据库的通信？
我做错了什么并且丢失了内存泄漏吗？还是仅仅是图书馆的工作方式？
是否可以在不重新启动puma worker的情况下释放内存？

谢谢！

Answer 1

您应该使用find_each而不是each，它专门用于这种情况，因为它将分批实例化模型并随后释放它们，而each将实例化所有模型一次。

CSV.open('full_report.csv', 'wb', encoding: UTF-8) do |file|
  Model.scope1(param).find_each do |item|
    file << [
      item.method1
    ]
  end
end

此外，在将CSV发送到浏览器之前，您应该流式传输CSV而不是将其写入内存或磁盘：

format.csv do
  headers["Content-Type"] = "text/csv"
  headers["Content-disposition"] = "attachment; filename=\"full_report.csv\""

  # streaming_headers
  # nginx doc: Setting this to "no" will allow unbuffered responses suitable for Comet and HTTP streaming applications
  headers['X-Accel-Buffering'] = 'no'
  headers["Cache-Control"] ||= "no-cache"
  headers.delete("Content-Length")
  response.status = 200

  header = ['Method 1', 'Method 2']
  csv_options = { col_sep: ";" }

  csv_enumerator = Enumerator.new do |y|
    y << CSV::Row.new(header, header).to_s(csv_options)
    Model.scope1(param).find_each do |item|
      y << CSV::Row.new(header, [item.method1, item.method2]).to_s(csv_options)
    end
  end

  # setting the body to an enumerator, rails will iterate this enumerator
  self.response_body = csv_enumerator
end

Answer 2

除了使用find_each之外，您还应该尝试使用ActiveJob在后台作业中运行ReportGenerator代码。由于后台作业在单独的进程中运行，因此当后台作业被杀死时，内存将释放回操作系统。

所以您可以尝试这样的事情：

用户请求一些报告（CSV，PDF，Excel）
某些控制器要求输入ReportGeneratorJob，并向用户显示确认信息
执行该作业，并发送一封包含下载链接/文件的电子邮件。

Answer 3

请注意，您可以轻松地改善ActiveRecord方面，但是当通过Rails发送响应时，所有响应都将最终存储在Response对象的https://github.com/rails/rails/blob/master/actionpack/lib/action_dispatch/http/response.rb#L110

的内存缓冲区中

您还需要利用实时流传输功能将数据直接传递给客户端，而无需进行缓冲：https://guides.rubyonrails.org/action_controller_overview.html#live-streaming-of-arbitrary-data

rails-导出巨大的CSV文件会消耗生产中的所有RAM

3 个答案: