读取.csv文件并在ruby中进行简单统计

时间:2012-11-29 07:33:50

标签: ruby arrays csv

我用jmeter生成一些负载测试结果,它输出格式良好的csv文件,但现在我需要用ruby做一些数字运算。 csv文件的示例开头:

threadName,grpThreads,allThreads,URL,Latency,SampleCount,ErrorCount
Thread Group 1-1,1,1,urlXX,240,1,0
Thread Group 1-1,1,1,urlYY,463,1,0
Thread Group 1-2,1,1,urlXX,200,1,0
Thread Group 1-3,1,1,urlXX,212,1,0
Thread Group 1-2,1,1,urlYY,454,1,0
.
.
.
Thread Group 1-N,1,1,urlXX,210,1,0

现在,对于统计信息,我需要读取每个线程组的第一行,向上添加Latency字段,然后除以我拥有的线程组数量,以获得平均延迟。然后迭代到每个线程组的第二行,依此类推..

我在想,也许我需要为每个线程组编写一些临时排序的csv文件(命中url的顺序在线程组中始终是相同的)然后使用它们作为输入,添加第一行,做数学,添加第二行,直到没有更多行。

但是由于线程组的数量发生了变化,我无法编写ruby以便它可以绕过那个...任何代码示例都会非常感激:)

1 个答案:

答案 0 :(得分:1)

[更新] - 这是你想要的,我想知道吗?

这个怎么样 - 它可能效率低下,但它能做你想做的事吗?

CSV = File.readlines("data.csv")
CSV.shift # minus the header.

# Hash where key is grp name; value is list of HASHES with keys {:grp, :lat}
hash = CSV.
  map {|l| # Turn every line into a HASH of grp name and it's lats.
    fs = l.split(","); {:grp => fs[0], :lat => fs[4]} 
  }.
  group_by{|o| o[:grp]}

# The largest number of lines we have in any group
max_lines = hash.max_by{|gname, l| l.size}.size

# AVGS is a list of averages. 
# AVGS[0] is the average lat. for all the first lines,
# AVGS[1] is the average lat. for all second lines, etc.
AVGS = 
(0..(max_lines-1)).map{|lno| # line no
  total = # total latency for the i'th line...
    hash.map {|gname, l|
      if l[lno] then  l[lno][:lat].to_i
      else 0 end
    }
  total.reduce{|a,b| a+b} / (hash.size)
}

# So we have 'L' Averages - where L is the maximum number of
# lines in any group. You could do anything with this list
# of numbers... find the average again?
puts AVGS.inspect

应该返回类似的内容:

[217/*avg for 1st-liners*/, 305 /*avg for 2nd liners*/]