Question

我需要以MB块的形式读取文件，在Ruby中有更简洁的方法吗？

FILENAME="d:\\tmp\\file.bin"
MEGABYTE = 1024*1024
size = File.size(FILENAME)
open(FILENAME, "rb") do |io| 
  read = 0
  while read < size
    left = (size - read)
    cur = left < MEGABYTE ? left : MEGABYTE
    data = io.read(cur)
    read += data.size
    puts "READ #{cur} bytes" #yield data
  end
end

Answer 1

改编自Ruby Cookbook第204页：

FILENAME = "d:\\tmp\\file.bin"
MEGABYTE = 1024 * 1024

class File
  def each_chunk(chunk_size = MEGABYTE)
    yield read(chunk_size) until eof?
  end
end

open(FILENAME, "rb") do |f|
  f.each_chunk { |chunk| puts chunk }
end

免责声明：我是一个红宝石新手，并没有对此进行过测试。

Answer 2

或者，如果您不想monkeypatch File：

until my_file.eof?
  do_something_with( my_file.read( bytes ) )
end

例如，将上传的临时文件流式传输到新文件中：

# tempfile is a File instance
File.open( new_file, 'wb' ) do |f|
  # Read in small 65k chunks to limit memory usage
  f.write(tempfile.read(2**16)) until tempfile.eof?
end

Answer 3

您可以使用IO#each(sep, limit)，并将sep设置为nil或空字符串，例如：

chunk_size = 1024
File.open('/path/to/file.txt').each(nil, chunk_size) do |chunk|
  puts chunk
end

Answer 4

如果您查看ruby文档： http://ruby-doc.org/core-2.2.2/IO.html 这是一条如下：

IO.foreach("testfile") {|x| print "GOT ", x }

唯一的警告是。因为，这个过程可以更快地读取临时文件生成的流，IMO，延迟应该抛出。

IO.foreach("/tmp/streamfile") {|line|
  ParseLine.parse(line)
  sleep 0.3 #pause as this process will discontine if it doesn't allow some buffering 
}

Answer 5

FILENAME="d:/tmp/file.bin"

class File
  MEGABYTE = 1024*1024

  def each_chunk(chunk_size=MEGABYTE)
    yield self.read(chunk_size) until self.eof?
  end
end

open(FILENAME, "rb") do |f|
  f.each_chunk {|chunk| puts chunk }
end

它有效， mbarkhau 。我只是将常量定义移动到File类，并为了清晰起见添加了一些“self”。

在Ruby中以块的形式读取文件

5 个答案: