我有来自多个主机的Java应用程序(Gigaspaces日志)中的几个文件,我需要根据日期/时间值进行合并。
由于每个日志文件都已经排序,我需要从每个文件到数组中获取第一条记录,确定哪一条文件具有最小值的密钥,将其合并到结果文件,从同一文件和放大器获取新行;重复。
记录的定义 - 第一行有一个键,所有后面的行都没有键,例如:
2015-04-05 02:33:42,135 GSC SEVERE [com.gigaspaces.lrmi] - LRMI Transport Protocol caught server exception caused by [/10.0.1.2:46949] client.; Caused by: java.lang.IllegalArgumentException
at java.nio.ByteBuffer.allocate(ByteBuffer.java:311)
at com.gigaspaces.lrmi.SmartByteBufferCache.get(SmartByteBufferCache.java:50)
at com.gigaspaces.lrmi.nio.Reader.readBytesFromChannelNoneBlocking(Reader.java:410)
at com.gigaspaces.lrmi.nio.Reader.readBytesNonBlocking(Reader.java:644)
at com.gigaspaces.lrmi.nio.Reader.bytesToStream(Reader.java:509)
at com.gigaspaces.lrmi.nio.Reader.readRequest(Reader.java:112)
at com.gigaspaces.lrmi.nio.ChannelEntry.readRequest(ChannelEntry.java:121)
at com.gigaspaces.lrmi.nio.Pivot.handleReadRequest(Pivot.java:445)
at com.gigaspaces.lrmi.nio.selector.handler.ReadSelectorThread.handleRead(ReadSelectorThread.java:81)
at com.gigaspaces.lrmi.nio.selector.handler.ReadSelectorThread.handleConnection(ReadSelectorThread.java:45)
at com.gigaspaces.lrmi.nio.selector.handler.AbstractSelectorThread.doSelect(AbstractSelectorThread.java:74)
at com.gigaspaces.lrmi.nio.selector.handler.AbstractSelectorThread.run(AbstractSelectorThread.java:50)
at java.lang.Thread.run(Thread.java:662)
理想情况下,结果文件应包含key,directory / filename.log&其余的记录。
问题:
答案 0 :(得分:1)
<强>代码强>
将所有以日期时间字符串开头的文件中的所有行读入数组,然后按日期时间字符串对数组进行排序:
require 'date'
def get_key_rows(*fnames)
fnames.flat_map do |fname|
IO.foreach(fname).with_object([]) do |s, arr|
dt = DateTime.strptime(s[0, 19], '%Y-%m-%d %H:%M:%S') rescue nil
arr << [s[0, 19], fname, s[19..-1].rstrip] if dt
end
end.sort_by(&:first)
end
此方法返回三元素数组的数组。每个三元素数组对应于其中一个文件中的一个关键行,包括日期/时间字符串,文件名以及日期/时间字符串后面的部分行的其余部分。请注意,不必在每个文件中排序关键行。该方法使用:
关于sort_by
,请注意字符串可以按日期/时间字符串排序,而不是按相应的DateTime
对象排序,因为日期/时间字符串的形式为'yyyy-mm-dd hh-mm-ss'
<强>实施例强>
让我们创建一些文件:
IO.write("f0", "2015-04-05 02:33:42,135 more stuff in f0\n" +
"more in f0\n" +
"2015-04-05 04:33:42,135 more stuff in f0\n" +
"even more in f0")
#=> 108
IO.write("f1", "2015-04-04 02:33:42,135 more stuff in f1\n" +
"2015-04-06 02:33:42,135 more stuff in f1\n" +
"more in f1")
#=> 92
IO.write("f2", "something in f2\n" +
"2015-04-05 02:33:43,135 more stuff in f2\n" +
"even more in f2\n" +
"2015-04-04 02:23:42,135 more stuff in f2")
#=> 113
get_key_rows('f0', 'f1', 'f2')
#=> [["2015-04-04 02:23:42", "f2", ",135 more stuff in f2"],
# ["2015-04-04 02:33:42", "f1", ",135 more stuff in f1"],
# ["2015-04-05 02:33:42", "f0", ",135 more stuff in f0"],
# ["2015-04-05 02:33:43", "f2", ",135 more stuff in f2"],
# ["2015-04-05 04:33:42", "f0", ",135 more stuff in f0"],
# ["2015-04-06 02:33:42", "f1", ",135 more stuff in f1"]]