Question

我有一个使用Paperclip gem存储上传的CSV文件的应用程序。

上传后，我希望能够将上传文件中的数据流式传输到逐行读取的代码中，并将其加载到Postgres的数据登台表中。

我在努力中取得了这么大的成就，其中data_file.upload是Paperclip CSV附件

io = StringIO.new(Paperclip.io_adapters.for(data_file.upload).read, 'r')

即使^^有效，但问题是 - 正如你所看到的 - 它将整个文件作为一个内容加载到内存中。 Ruby String和Ruby String垃圾对于应用程序性能来说是非常糟糕的。

相反，我想要一个支持使用例如io.gets的Ruby IO对象，以便IO对象处理缓冲和清理，并且整个文件不会作为一个巨大的字符串存储在内存中。 / p>

提前感谢任何建议！

Answer 1

在一些帮助下（当然是来自StackOverflow），我自己能够提出这个问题。

在我的PaperClip AR模型对象中，我现在有以下内容：

# Done this way so we get auto-closing of the File object
def yielding_upload_as_readable_file
  # It's quite annoying that there's not 1 method that works for both filesystem and S3 storage
  open(filesystem_storage? ? upload.path : upload.url) { |file| yield file }
end

def filesystem_storage?
  Paperclip::Attachment.default_options[:storage] == :filesystem
end

...而且，我在另一个模型中使用它：

data_file.yielding_upload_as_readable_file do |file|
  while line = file.gets
    next if line.strip.size == 0
    ... process line ...
  end
end

如何获取Paperclip附件的Ruby IO流？

1 个答案: