Ruby

时间:2015-09-19 21:33:43

标签: ruby

我有来自多个主机的Java应用程序(Gigaspaces日志)中的几个文件,我需要根据日期/时间值进行合并。

由于每个日志文件都已经排序,我需要从每个文件到数组中获取第一条记录,确定哪一条文件具有最小值的密钥,将其合并到结果文件,从同一文件和放大器获取新行;重复。

记录的定义 - 第一行有一个键,所有后面的行都没有键,例如:

2015-04-05 02:33:42,135 GSC SEVERE [com.gigaspaces.lrmi] - LRMI Transport Protocol caught server exception caused by [/10.0.1.2:46949] client.; Caused by: java.lang.IllegalArgumentException
    at java.nio.ByteBuffer.allocate(ByteBuffer.java:311)
    at com.gigaspaces.lrmi.SmartByteBufferCache.get(SmartByteBufferCache.java:50)
    at com.gigaspaces.lrmi.nio.Reader.readBytesFromChannelNoneBlocking(Reader.java:410)
    at com.gigaspaces.lrmi.nio.Reader.readBytesNonBlocking(Reader.java:644)
    at com.gigaspaces.lrmi.nio.Reader.bytesToStream(Reader.java:509)
    at com.gigaspaces.lrmi.nio.Reader.readRequest(Reader.java:112)
    at com.gigaspaces.lrmi.nio.ChannelEntry.readRequest(ChannelEntry.java:121)
    at com.gigaspaces.lrmi.nio.Pivot.handleReadRequest(Pivot.java:445)
    at com.gigaspaces.lrmi.nio.selector.handler.ReadSelectorThread.handleRead(ReadSelectorThread.java:81)
    at com.gigaspaces.lrmi.nio.selector.handler.ReadSelectorThread.handleConnection(ReadSelectorThread.java:45)
    at com.gigaspaces.lrmi.nio.selector.handler.AbstractSelectorThread.doSelect(AbstractSelectorThread.java:74)
    at com.gigaspaces.lrmi.nio.selector.handler.AbstractSelectorThread.run(AbstractSelectorThread.java:50)
    at java.lang.Thread.run(Thread.java:662)

理想情况下,结果文件应包含key,directory / filename.log&其余的记录。

问题:

  1. 如何从Ruby中的文件中获取记录?
  2. 如何使用上述算法打开多个文件并进行迭代?

1 个答案:

答案 0 :(得分:1)

<强>代码

将所有以日期时间字符串开头的文件中的所有行读入数组,然后按日期时间字符串对数组进行排序:

require 'date'

def get_key_rows(*fnames)
  fnames.flat_map do |fname|
    IO.foreach(fname).with_object([]) do |s, arr|
      dt = DateTime.strptime(s[0, 19], '%Y-%m-%d %H:%M:%S') rescue nil
      arr << [s[0, 19], fname, s[19..-1].rstrip] if dt 
    end
  end.sort_by(&:first)
end

此方法返回三元素数组的数组。每个三元素数组对应于其中一个文件中的一个关键行,包括日期/时间字符串,文件名以及日期/时间字符串后面的部分行的其余部分。请注意,不必在每个文件中排序关键行。该方法使用:

关于sort_by,请注意字符串可以按日期/时间字符串排序,而不是按相应的DateTime对象排序,因为日期/时间字符串的形式为'yyyy-mm-dd hh-mm-ss'

<强>实施例

让我们创建一些文件:

IO.write("f0", "2015-04-05 02:33:42,135 more stuff in f0\n" +
               "more in f0\n" +
               "2015-04-05 04:33:42,135 more stuff in f0\n" +
               "even more in f0")
  #=> 108

IO.write("f1", "2015-04-04 02:33:42,135 more stuff in f1\n" +
               "2015-04-06 02:33:42,135 more stuff in f1\n" + 
               "more in f1")
  #=> 92


IO.write("f2", "something in f2\n" +
               "2015-04-05 02:33:43,135 more stuff in f2\n" +
               "even more in f2\n" +
               "2015-04-04 02:23:42,135 more stuff in f2")
  #=> 113

get_key_rows('f0', 'f1', 'f2')
  #=> [["2015-04-04 02:23:42", "f2", ",135 more stuff in f2"],
  #    ["2015-04-04 02:33:42", "f1", ",135 more stuff in f1"],
  #    ["2015-04-05 02:33:42", "f0", ",135 more stuff in f0"],
  #    ["2015-04-05 02:33:43", "f2", ",135 more stuff in f2"],
  #    ["2015-04-05 04:33:42", "f0", ",135 more stuff in f0"],
  #    ["2015-04-06 02:33:42", "f1", ",135 more stuff in f1"]]