我写了一个程序,它找到了一个单独的txt文件中的大数据集的平均值和标准差。我希望这个程序可以处理任何数据集。我通过输入两个简单的数据点(年和月与温度相关)来测试我的程序:
2009-11,20 2009-12,10
运行时,它表示我的平均值为20,标准偏差为0(显然是错误的)。
这是我的计划:
data = File.open("test.txt", "r+")
contents = data.read
contents = contents.split("\r\n")
#split up array
contents.collect! do |x|
x.split(',')
end
sum = 0
contents.each do |x|
#make loop to find average
sum = sum + x[1].to_f
end
avg = sum / contents.length
puts "The average of your large data set is: #{ avg.round(3)} (Answer is rounded to nearest thousandth place)"
#puts average
#similar to finding average, this finds the standard deviation
variance = 0
contents.each do |x|
variance = variance + (x[1].to_f - avg)**2
end
variance = variance / contents.length
variance = Math.sqrt(variance)
puts "The standard deviation of your large data set is:#{ variance.round(3)} (Answer is rounded to nearest thousandth place)"
答案 0 :(得分:1)
我认为问题来自于使用依赖于操作系统的\r\n
分割数据:如果你在Linux上,它应该是contents.split('\n')
。无论哪种方式,使用IO#each
迭代文件中的每一行并让Ruby处理行结束字符可能会更好。
data = File.open("test.txt", "r+")
count = 0
sum = 0
variance = 0
data.each do |line|
value = line.split(',')[1]
sum = sum + value.to_f
count += 1
end
avg = sum / count
puts "The average of your large data set is: #{ avg.round(3)} (Answer is rounded to nearest thousandth place)"
# We need to get back to the top of the file
data.rewind
data.each do |line|
value = line.split(',')[1]
variance = variance + (value.to_f - avg)**2
end
variance = variance / count
variance = Math.sqrt(variance)
puts "The standard deviation of your large data set is: #{ variance.round(3)} (Answer is rounded to nearest thousandth place)"