Question

我需要从文件中读取最后25行（用于显示最新的日志条目）。无论如何在Ruby中，从文件的末尾开始并向后读取它？

Answer 1

如果在带有tail的* nix系统上，你可以这样作弊：

last_25_lines = `tail -n 25 whatever.txt`

Answer 2

文件是否足够大，您需要避免阅读整个文件？如果没有，你可以做到

IO.readlines("file.log")[-25..-1]

如果它很大，你可能需要使用IO#seek从文件末尾读取，并继续寻找开头，直到你看到25行。

Answer 3

Ruby有一个名为File::Tail的库。这可以像UNIX尾部实用程序一样获取文件的最后N行。

我假设在UNIX版本的尾部有一些搜索优化，有这样的基准测试（在超过11M的文本文件上测试）：

[john@awesome]$du -sh 11M.txt
11M     11M.txt
[john@awesome]$time tail -n 25 11M.txt
/sbin/ypbind
/sbin/arptables
/sbin/arptables-save
/sbin/change_console
/sbin/mount.vmhgfs
/misc
/csait
/csait/course
/.autofsck
/~
/usb
/cdrom
/homebk
/staff
/staff/faculty
/staff/faculty/darlinr
/staff/csadm
/staff/csadm/service_monitor.sh
/staff/csadm/.bash_history
/staff/csadm/mysql5
/staff/csadm/mysql5/MySQL-server-community-5.0.45-0.rhel5.i386.rpm
/staff/csadm/glibc-common-2.3.4-2.39.i386.rpm
/staff/csadm/glibc-2.3.4-2.39.i386.rpm
/staff/csadm/csunixdb.tgz
/staff/csadm/glibc-headers-2.3.4-2.39.i386.rpm

real    0m0.012s
user    0m0.000s
sys     0m0.010s

我只能想象Ruby库使用类似的方法。

修改

对于Pax的好奇心：

[john@awesome]$time cat 11M.txt | tail -n 25 /sbin/ypbind /sbin/arptables /sbin/arptables-save /sbin/change_console /sbin/mount.vmhgfs /misc /csait /csait/course /.autofsck /~ /usb /cdrom /homebk /staff /staff/faculty /staff/faculty/darlinr /staff/csadm /staff/csadm/service_monitor.sh /staff/csadm/.bash_history /staff/csadm/mysql5 /staff/csadm/mysql5/MySQL-server-community-5.0.45-0.rhel5.i386.rpm /staff/csadm/glibc-common-2.3.4-2.39.i386.rpm /staff/csadm/glibc-2.3.4-2.39.i386.rpm /staff/csadm/csunixdb.tgz /staff/csadm/glibc-headers-2.3.4-2.39.i386.rpm real 0m0.350s user 0m0.000s sys 0m0.130s

仍然不到一秒钟，但如果有很多文件操作，这会产生很大的不同。

Answer 4

改进版本的manveru卓越的基于搜索的解决方案。这个确切地返回n行。

class File

  def tail(n)
    buffer = 1024
    idx = [size - buffer, 0].min
    chunks = []
    lines = 0

    begin
      seek(idx)
      chunk = read(buffer)
      lines += chunk.count("\n")
      chunks.unshift chunk
      idx -= buffer
    end while lines < ( n + 1 ) && pos != 0

    tail_of_file = chunks.join('')
    ary = tail_of_file.split(/\n/)
    lines_to_return = ary[ ary.size - n, ary.size - 1 ]

  end
end

Answer 5

我刚刚用#seek编写了一个快速实现：

class File
  def tail(n)
    buffer = 1024
    idx = (size - buffer).abs
    chunks = []
    lines = 0

    begin
      seek(idx)
      chunk = read(buffer)
      lines += chunk.count("\n")
      chunks.unshift chunk
      idx -= buffer
    end while lines < n && pos != 0

    chunks.join.lines.reverse_each.take(n).reverse.join
  end
end

File.open('rpn-calculator.rb') do |f|
  p f.tail(10)
end

Answer 6

这是一个尾巴版本，在你去的时候不会在内存中存储任何缓冲区，而是使用＆＃34;指针＆＃34;。也可以进行边界检查，这样你就不会寻求负偏移（例如，如果你有更多的东西要读，但是你的块大小还要小。）

def tail(path, n)
  file = File.open(path, "r")
  buffer_s = 512
  line_count = 0
  file.seek(0, IO::SEEK_END)

  offset = file.pos # we start at the end

  while line_count <= n && offset > 0
    to_read = if (offset - buffer_s) < 0
                offset
              else
                buffer_s
              end

    file.seek(offset-to_read)
    data = file.read(to_read)

    data.reverse.each_char do |c|
      if line_count > n
        offset += 1
        break
      end
      offset -= 1
      if c == "\n"
        line_count += 1
      end
    end
  end

  file.seek(offset)
  data = file.read
end

https://gist.github.com/shaiguitar/6d926587e98fc8a5e301

的测试用例

Answer 7

我不能担保Ruby，但大多数这些语言都遵循文件I / O的C语言。这意味着除了搜索之外别无他法。这通常采取两种方法之一。

从文件的开头开始扫描，记住最近的25行。然后，当您点击文件末尾时，将其打印出来。
类似的方法，但首先尝试寻找最佳猜测位置。这意味着寻求（例如）文件结尾减去4000个字符，然后完成您在第一个方法中所做的与条件，如果你没有得到25行，你必须备份并再试一次（例如，到文件末尾减去5000个字符）。

第二种方式是我喜欢的方式，因为如果你明智地选择第一个偏移量，你几乎肯定只需要一次射击。日志文件仍然倾向于具有固定的最大行长度（我认为编码器在其有用性降低之后仍然有80个列文件的倾向）。我倾向于选择所需的行数乘以132作为我的偏移量。

从粗略的一瞥Ruby docs在线，看起来遵循C语言。如果你要遵循我的建议，你会使用"ios.seek(25*-132,IO::SEEK_END)"，然后从那里开始阅读。

Answer 8

怎么样：

file = []
File.open("file.txt").each_line do |line|
  file << line
end

file.reverse.each_with_index do |line, index|
  puts line if index < 25
end

对于一个大文件，性能会很糟糕，因为它迭代两次，更好的方法是已经提到的读取文件并将最后25行存储在内存中并显示它们。但这只是另一种想法。

在Ruby中读取文件的最后n行？

8 个答案: