如何找到红宝石的平均值和标准差?

时间:2013-10-22 00:30:55

标签: ruby average standard-deviation

我写了一个程序,它找到了一个单独的txt文件中的大数据集的平均值和标准差。我希望这个程序可以处理任何数据集。我通过输入两个简单的数据点(年和月与温度相关)来测试我的程序:

2009-11,20
2009-12,10

运行时,它表示我的平均值为20,标准偏差为0(显然是错误的)。

这是我的计划:

data = File.open("test.txt", "r+")
contents = data.read

contents = contents.split("\r\n")

#split up array
contents.collect! do |x|
  x.split(',')
end

sum = 0

contents.each do |x|
  #make loop to find average
  sum = sum  + x[1].to_f
end
avg = sum / contents.length
puts "The average of your large data set is: #{ avg.round(3)} (Answer is rounded to nearest thousandth place)"
#puts average

#similar to finding average, this finds the standard deviation
variance = 0
contents.each do |x|
  variance = variance + (x[1].to_f - avg)**2
end

variance = variance / contents.length
variance = Math.sqrt(variance)
puts "The standard deviation of your large data set is:#{ variance.round(3)} (Answer is rounded to nearest thousandth place)"

1 个答案:

答案 0 :(得分:1)

我认为问题来自于使用依赖于操作系统的\r\n分割数据:如果你在Linux上,它应该是contents.split('\n')。无论哪种方式,使用IO#each迭代文件中的每一行并让Ruby处理行结束字符可能会更好。

data = File.open("test.txt", "r+")

count = 0
sum = 0
variance = 0

data.each do |line|
  value = line.split(',')[1]
  sum = sum  + value.to_f
  count += 1
end

avg = sum / count
puts "The average of your large data set is: #{ avg.round(3)} (Answer is rounded to nearest thousandth place)"

# We need to get back to the top of the file
data.rewind

data.each do |line|
  value = line.split(',')[1]
  variance = variance + (value.to_f - avg)**2
end

variance = variance / count
variance = Math.sqrt(variance)
puts "The standard deviation of your large data set is: #{ variance.round(3)} (Answer is rounded to nearest thousandth place)"