在Chez Scheme中,此双循环比C ++慢50倍(分别与--optimize-level 3
和-O3
编译)
(import
(rnrs)
(rnrs r5rs))
(let* ((n (* 1024 16))
(a (make-vector n))
(acc 0))
(do ((i 0 (+ i 1)))
((= i n) #f)
(vector-set! a i (cons (cos i) (sin i))))
(do ((i 0 (+ i 1)))
((= i n) #f)
(do ((j 0 (+ j 1)))
((= j n) #f)
(let ((ai (vector-ref a i))
(aj (vector-ref a j)))
(set! acc (+ acc (+ (* (car ai) (cdr aj))
(* (cdr ai) (car aj))))))))
(write acc)
(newline))
(exit)
vs
#include <iostream>
#include <cmath>
#include <vector>
#include <algorithm>
typedef std::pair<double, double> pr;
typedef std::vector<pr> vec;
double loop(const vec& a)
{
double acc = 0;
const int n = a.size();
for(int i = 0; i < n; ++i)
for(int j = 0; j < n; ++j)
{
const pr& ai = a[i];
const pr& aj = a[j];
acc += ai .first * aj.second +
ai.second * aj .first;
}
return acc;
}
int main()
{
const int n = 1024 * 16;
vec v(n);
for(int i = 0; i < n; ++i)
v[i] = pr(std::cos(i), std::sin(i));
std::cout << loop(v) << std::endl;
}
我意识到,Scheme中的内存间接性比C ++中的更多,但是性能差异仍然令人惊讶...
是否有一种简单的方法可以加快Scheme版本的运行速度? (无需将内存布局更改为完全单一的内容)
答案 0 :(得分:2)
因此,尽管这些程序看起来确实相同,但它们并不相同。您正在C版本中使用fixnum算术,而Scheme版本则使用标准数字塔。要使C版本更像Scheme,请尝试使用bignum库进行计算。
作为测试,我用(rnrs arithmetic flonums)
和(rnrs arithmetic fixnums)
替换了算术,这使DrRacket中的执行时间减半。我希望在任何实施中都会发生同样的情况。
现在,我的初始测试表明C代码的执行速度提高了约25倍,而不是预期的50倍,并且通过更改为浮点算法,我将C语言的执行速度提高了约15倍。
我认为我可以通过使用不安全的过程来加快速度,因为Scheme在运行时会检查每个参数的类型,因此会在每个过程之前进行操作,而这在C版本中是不会发生的。作为测试,我将其更改为在实现中使用不安全的程序,但现在速度仅慢了10倍。
希望它对Chez也有帮助:)
编辑
这是我修改的来源,可将速度提高2倍:
#!r6rs
(import
(rnrs)
;; import the * and + that only work on floats (which are faster, but they still check their arguments)
(only (rnrs arithmetic flonums) fl+ fl*))
(let* ((n (* 1024 16))
(a (make-vector n))
(acc 0.0)) ; We want float, lets tell Scheme about that!
;; using inexact f instead of integer i
;; makes every result of cos and sin inexact
(do ((i 0 (+ i 1))
(f 0.0 (+ f 1)))
((= i n) #f)
(vector-set! a i (cons (cos f) (sin f))))
(do ((i 0 (+ i 1)))
((= i n) #f)
(do ((j 0 (+ j 1)))
((= j n) #f)
(let ((ai (vector-ref a i))
(aj (vector-ref a j)))
;; use float versions of + and *
;; since this is where most of the time is used
(set! acc (fl+ acc
(fl+ (fl* (car ai) (cdr aj))
(fl* (cdr ai) (car aj))))))))
(write acc)
(newline))
特定于实现的(锁定)只是为了告诉您在运行时完成类型检查确实会影响此代码的运行,比之前的优化快30%:
#lang racket
;; this imports import the * and + for floats as unsafe-fl* etc.
(require racket/unsafe/ops)
(let* ((n (* 1024 16))
(a (make-vector n))
(acc 0.0)) ; We want float, lets tell Scheme about that!
(do ((i 0 (+ i 1))
(f 0.0 (+ f 1)))
((= i n) #f)
;; using inexact f instead of integer i
;; makes every result of cos and sin inexact
(vector-set! a i (cons (cos f) (sin f))))
(do ((i 0 (+ i 1)))
((= i n) #f)
(do ((j 0 (+ j 1)))
((= j n) #f)
;; We guarantee argument is a vector
;; and nothing wrong will happen using unsafe accessors
(let ((ai (unsafe-vector-ref a i))
(aj (unsafe-vector-ref a j)))
;; use unsafe float versions of + and *
;; since this is where most of the time is used
;; also use unsafe car/cdr as we guarantee the argument is
;; a pair.
(set! acc (unsafe-fl+ acc
(unsafe-fl+ (unsafe-fl* (unsafe-car ai) (unsafe-cdr aj))
(unsafe-fl* (unsafe-cdr ai) (unsafe-car aj))))))))
(write acc)
(newline))
我已努力保持原始代码的风格。这不是很惯用的方案。例如。我根本不会使用set!
,但它不会影响速度。