我是朱莉娅的新手,我写了一个计算RMSE(均方根误差)的简单函数。 ratings
是一个评分矩阵,每行为[user, film, rating]
。有1500万个评级。 rmse()
方法需要12.0秒,但Java实现速度提高了大约188倍:0.064秒。为什么Julia实施会变慢?在Java中,我正在处理一组Rating
个对象,如果它是一个多维int
数组,它会更快。
ratings = readdlm("ratings.dat", Int32)
function predict(user, film)
return 3.462
end
function rmse()
total = 0.0
for i in 1:size(ratings, 1)
r = ratings[i,:]
diff = predict(r[1], r[2]) - r[3]
total += diff * diff
end
return sqrt(total / size(ratings)[1])
end
编辑:避免全局变量后,它在1.99秒内完成(比Java慢31倍)。删除r = ratings[i,:]
后,它为0.856秒(慢13倍)。
答案 0 :(得分:10)
一些建议:
ratings
作为参数传递。r = ratings[i,:]
行制作副本,速度很慢。相反,请使用predict(r[i,1], r[i,2]) - r[i,3]
。square()
可能比x*x
更快 - 尝试一下。NumericExtensions.jl
package,它为许多常见的数字操作提供了疯狂优化的功能。 (see the julia-dev list)答案 1 :(得分:7)
对我来说,以下代码在0.024秒内运行(我怀疑我的笔记本电脑比你的机器快得多)。我用评论输出的行初始化了评级,因为我没有你提到的文件。
function predict(user, film)
return 3.462
end
function rmse(r)
total = 0.0
for i = 1:size(r,1)
diff = predict(r[i,1],r[i,2]) - r[i,3]
total += diff * diff
end
return sqrt(total / size(r,1))
end
# ratings = rand(1:20, 5000000, 3)
答案 2 :(得分:5)
在我的系统上,问题似乎是你的常量predict
函数没有得到优化。将多余的调用替换为predict
会使代码在0.01秒内运行。
function time()
ratings = ones(15_000_000, 3)
predict(user, film) = 3.462
function rmse(ratings)
total = 0.0
for i in 1:size(ratings, 1)
diff = predict(ratings[i, 1], ratings[i, 2]) - ratings[3]
total += diff * diff
end
return sqrt(total / size(ratings, 1))
end
rmse(ratings)
@elapsed rmse(ratings)
end
time()
function time2()
ratings = ones(15_000_000, 3)
predict(user, film) = 3.462
function rmse(ratings)
total = 0.0
for i in 1:size(ratings, 1)
diff = 3.462 - ratings[3]
total += diff * diff
end
return sqrt(total / size(ratings, 1))
end
rmse(ratings)
@elapsed rmse(ratings)
end
time2()