为什么这个OCaml程序比我的C程序更快?

时间:2009-10-19 00:17:24

标签: c performance ocaml

我在C,Python和OCaml中编写了一个基本的Hippity Hop程序。当然,这可能不是这三种语言的非常好的基准。但我得到的结果是这样的:

  • Python:.350秒
  • C:.050秒
  • 解释 OCaml:.040秒
  • 编译OCaml:.010

python性能并没有让我感到惊讶,但我对OCaml的速度感到震惊(特别是解释版本)。为了比较,我将发布C版本和OCaml版本。

C

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

long get_count(char *name);

int main(int argc, char *argv[])
{
  if (argc != 2){
    printf("Filename must be specified as a positional argument.\n");
    exit(EXIT_FAILURE);
  }

  long count_no = get_count(argv[1]);

  int i;
  for (i = 1; i <= count_no; i++){
    if (((i % 3) == 0) && ((i % 5) == 0)){
      printf("Hop\n");
      continue;
    }
    if ((i % 3) == 0){
      printf("Hoppity\n");
    }
    if ((i % 5) == 0){
      printf("Hophop\n");
    }
  }
  return 0;
}

long get_count(char *name){
  FILE *fileptr = fopen(name, "r");
  if (!fileptr){
    printf("Unable to open file %s.\n", name);
    exit(EXIT_FAILURE);
  }
  size_t text_len = 20;
  char *file_text = calloc(text_len, sizeof(char));
  while (!feof(fileptr)){
    fread(file_text, sizeof(char), text_len, fileptr);
    assert(!ferror(fileptr));
    text_len += 20;
    file_text = realloc(file_text, text_len * sizeof(char));
  }
  long file_as_int = strtol(file_text, NULL, 10);

  free(file_text);
  return file_as_int;
}

OCaml的

open String;;

let trim str =
  if str = "" then "" else
  let search_pos init p next =
    let rec search i =
      if p i then raise(Failure "empty") else
      match str.[i] with
      | ' ' | '\n' | '\r' | '\t' -> search (next i)
      | _ -> i
    in
    search init
  in
  let len = String.length str in
  try
    let left = search_pos 0 (fun i -> i >= len) (succ)
    and right = search_pos (len - 1) (fun i -> i < 0) (pred)
    in
    String.sub str left (right - left + 1)
  with
  | Failure "empty" -> ""
;;

let rec iterate_over_numbers curr_num max_num =
  (
   if curr_num <= max_num then (
     if ((curr_num mod 3) == 0) && ((curr_num mod 5) == 0) then 
       print_endline "Hop"
     else if (curr_num mod 3) == 0 then 
       print_endline "Hoppity"
     else if (curr_num mod 5) == 0 then
       print_endline "Hophop";
     iterate_over_numbers (curr_num + 1) max_num
   ))
;;


let fname = Sys.argv.(1);;
let infile = open_in fname;;
let file_text = trim (input_line infile);;
close_in infile;;
let input_number = int_of_string file_text;;
iterate_over_numbers 1 input_number;;

但我很想知道为什么我会得到这些结果。我在C程序中做了一些愚蠢的事情,还是这只是OCaml更快的事情?我觉得有点奇怪的是,解释程序的运行速度比C版快一点,编译程序的运行速度提高了5倍。

5 个答案:

答案 0 :(得分:9)

你的C代码不等同于OCaml代码 - 你在OCaml中使用'else if'来避免重新计算模数。

有很多代码在'读长整数'中。为什么不使用fscanf();它自动跳过空白和所有这些,并避免你执行malloc()等。我不经常建议使用fscanf(),但这看起来像是一个设置 - 一行,可能有空格一边,没有好笑的东西。


好奇心杀死了猫 - 但在这种情况下,不是豹子。

我为MacOS X Intel下载了OCaml 3.11.1,并将问题中的OCaml代码复制到xxx.ml(OCaml)中,并将其编译成目标文件xxx(使用“ocamlc -o xxx xxx.ml”);我将C代码逐字复制到yyy.c中并使用fscanf()fclose()创建了一个变体zzz.c,并使用“gcc -O -y yyy yyy.c”和“gcc -O”编译它们-o zzz zzz.c“。我创建了一个文件'file3',其中包含:“987654”加上换行符。我创建了一个shell脚本runthem.sh,如图所示。请注意,'time'是一个令人讨厌的命令,它认为它的输出必须转到stderr,即使你不想这样做 - 你必须努力工作才能将输出放到你想要的位置。 (range命令生成给定范围内的数字,包括 - 因此每个程序有11个值。)

Osiris JL: cat runthem.sh
for prog in "ocaml xxx.ml" ./xxx ./yyy ./zzz
do
    for iter in $(range 0 10)
    do
        r=$(sh -c "time $prog file3 >/dev/null" 2>&1)
        echo $prog: $r
    done
done
Osiris JL: 

我在运行Leopard(10.5.8)的现代MacBook Pro(3 GHz Core 2 Duo等,4GB RAM)上运行了所有这些。我得到的时间显示如下:

Osiris JL: sh runthem.sh
ocaml xxx.ml: real 0m0.961s user 0m0.524s sys 0m0.432s
ocaml xxx.ml: real 0m0.953s user 0m0.516s sys 0m0.430s
ocaml xxx.ml: real 0m0.959s user 0m0.517s sys 0m0.431s
ocaml xxx.ml: real 0m0.951s user 0m0.517s sys 0m0.430s
ocaml xxx.ml: real 0m0.952s user 0m0.516s sys 0m0.431s
ocaml xxx.ml: real 0m0.952s user 0m0.514s sys 0m0.431s
ocaml xxx.ml: real 0m0.951s user 0m0.515s sys 0m0.431s
ocaml xxx.ml: real 0m0.959s user 0m0.515s sys 0m0.431s
ocaml xxx.ml: real 0m0.950s user 0m0.515s sys 0m0.431s
ocaml xxx.ml: real 0m0.956s user 0m0.516s sys 0m0.431s
ocaml xxx.ml: real 0m0.952s user 0m0.514s sys 0m0.432s
./xxx: real 0m0.928s user 0m0.494s sys 0m0.430s
./xxx: real 0m0.938s user 0m0.494s sys 0m0.430s
./xxx: real 0m0.927s user 0m0.494s sys 0m0.430s
./xxx: real 0m0.928s user 0m0.492s sys 0m0.430s
./xxx: real 0m0.928s user 0m0.493s sys 0m0.430s
./xxx: real 0m0.927s user 0m0.493s sys 0m0.430s
./xxx: real 0m0.928s user 0m0.492s sys 0m0.430s
./xxx: real 0m0.933s user 0m0.497s sys 0m0.428s
./xxx: real 0m0.926s user 0m0.494s sys 0m0.429s
./xxx: real 0m0.921s user 0m0.492s sys 0m0.428s
./xxx: real 0m0.925s user 0m0.494s sys 0m0.428s
./yyy: real 0m0.027s user 0m0.026s sys 0m0.001s
./yyy: real 0m0.031s user 0m0.026s sys 0m0.002s
./yyy: real 0m0.028s user 0m0.026s sys 0m0.001s
./yyy: real 0m0.029s user 0m0.026s sys 0m0.002s
./yyy: real 0m0.028s user 0m0.026s sys 0m0.001s
./yyy: real 0m0.029s user 0m0.026s sys 0m0.002s
./yyy: real 0m0.028s user 0m0.026s sys 0m0.001s
./yyy: real 0m0.031s user 0m0.026s sys 0m0.002s
./yyy: real 0m0.028s user 0m0.026s sys 0m0.001s
./yyy: real 0m0.030s user 0m0.026s sys 0m0.002s
./yyy: real 0m0.028s user 0m0.026s sys 0m0.001s
./zzz: real 0m0.030s user 0m0.027s sys 0m0.002s
./zzz: real 0m0.029s user 0m0.027s sys 0m0.001s
./zzz: real 0m0.030s user 0m0.027s sys 0m0.002s
./zzz: real 0m0.029s user 0m0.027s sys 0m0.001s
./zzz: real 0m0.030s user 0m0.027s sys 0m0.002s
./zzz: real 0m0.029s user 0m0.027s sys 0m0.001s
./zzz: real 0m0.030s user 0m0.027s sys 0m0.002s
./zzz: real 0m0.029s user 0m0.027s sys 0m0.001s
./zzz: real 0m0.030s user 0m0.027s sys 0m0.002s
./zzz: real 0m0.029s user 0m0.027s sys 0m0.001s
./zzz: real 0m0.030s user 0m0.027s sys 0m0.002s
Osiris JL:

我没有看到OCaml代码比C代码运行得更快。我在读取的文件中运行了较小数字的测试,结果同样支持C代码:

停止编号:345

ocaml xxx.ml: real 0m0.027s user 0m0.020s sys 0m0.005s
ocaml xxx.ml: real 0m0.021s user 0m0.016s sys 0m0.005s
ocaml xxx.ml: real 0m0.025s user 0m0.016s sys 0m0.004s
ocaml xxx.ml: real 0m0.020s user 0m0.015s sys 0m0.003s
ocaml xxx.ml: real 0m0.022s user 0m0.016s sys 0m0.004s
ocaml xxx.ml: real 0m0.019s user 0m0.015s sys 0m0.003s
ocaml xxx.ml: real 0m0.021s user 0m0.016s sys 0m0.004s
ocaml xxx.ml: real 0m0.020s user 0m0.015s sys 0m0.004s
ocaml xxx.ml: real 0m0.021s user 0m0.016s sys 0m0.004s
ocaml xxx.ml: real 0m0.020s user 0m0.015s sys 0m0.004s
ocaml xxx.ml: real 0m0.021s user 0m0.016s sys 0m0.004s
./xxx: real 0m0.003s user 0m0.001s sys 0m0.002s
./xxx: real 0m0.003s user 0m0.001s sys 0m0.001s
./xxx: real 0m0.003s user 0m0.001s sys 0m0.001s
./xxx: real 0m0.002s user 0m0.001s sys 0m0.001s
./xxx: real 0m0.003s user 0m0.001s sys 0m0.001s
./xxx: real 0m0.005s user 0m0.001s sys 0m0.002s
./xxx: real 0m0.003s user 0m0.001s sys 0m0.002s
./xxx: real 0m0.003s user 0m0.001s sys 0m0.001s
./xxx: real 0m0.003s user 0m0.001s sys 0m0.001s
./xxx: real 0m0.003s user 0m0.001s sys 0m0.001s
./xxx: real 0m0.003s user 0m0.001s sys 0m0.001s
./yyy: real 0m0.002s user 0m0.000s sys 0m0.001s
./yyy: real 0m0.003s user 0m0.000s sys 0m0.001s
./yyy: real 0m0.002s user 0m0.000s sys 0m0.001s
./yyy: real 0m0.002s user 0m0.000s sys 0m0.001s
./yyy: real 0m0.002s user 0m0.000s sys 0m0.001s
./yyy: real 0m0.002s user 0m0.000s sys 0m0.001s
./yyy: real 0m0.002s user 0m0.000s sys 0m0.001s
./yyy: real 0m0.001s user 0m0.000s sys 0m0.001s
./yyy: real 0m0.002s user 0m0.000s sys 0m0.001s
./yyy: real 0m0.002s user 0m0.000s sys 0m0.001s
./yyy: real 0m0.003s user 0m0.000s sys 0m0.002s
./zzz: real 0m0.002s user 0m0.000s sys 0m0.001s
./zzz: real 0m0.002s user 0m0.000s sys 0m0.001s
./zzz: real 0m0.002s user 0m0.000s sys 0m0.001s
./zzz: real 0m0.002s user 0m0.000s sys 0m0.001s
./zzz: real 0m0.002s user 0m0.000s sys 0m0.001s
./zzz: real 0m0.002s user 0m0.000s sys 0m0.001s
./zzz: real 0m0.001s user 0m0.000s sys 0m0.001s
./zzz: real 0m0.002s user 0m0.000s sys 0m0.001s
./zzz: real 0m0.003s user 0m0.000s sys 0m0.002s
./zzz: real 0m0.002s user 0m0.000s sys 0m0.001s
./zzz: real 0m0.002s user 0m0.000s sys 0m0.001s

停止编号:87654

ocaml xxx.ml: real 0m0.102s user 0m0.059s sys 0m0.041s
ocaml xxx.ml: real 0m0.102s user 0m0.059s sys 0m0.040s
ocaml xxx.ml: real 0m0.101s user 0m0.060s sys 0m0.040s
ocaml xxx.ml: real 0m0.103s user 0m0.059s sys 0m0.041s
ocaml xxx.ml: real 0m0.102s user 0m0.059s sys 0m0.041s
ocaml xxx.ml: real 0m0.101s user 0m0.059s sys 0m0.041s
ocaml xxx.ml: real 0m0.102s user 0m0.059s sys 0m0.040s
ocaml xxx.ml: real 0m0.103s user 0m0.059s sys 0m0.040s
ocaml xxx.ml: real 0m0.101s user 0m0.059s sys 0m0.040s
ocaml xxx.ml: real 0m0.102s user 0m0.059s sys 0m0.040s
ocaml xxx.ml: real 0m0.105s user 0m0.059s sys 0m0.041s
./xxx: real 0m0.092s user 0m0.044s sys 0m0.038s
./xxx: real 0m0.087s user 0m0.044s sys 0m0.039s
./xxx: real 0m0.085s user 0m0.044s sys 0m0.038s
./xxx: real 0m0.084s user 0m0.044s sys 0m0.038s
./xxx: real 0m0.085s user 0m0.044s sys 0m0.039s
./xxx: real 0m0.086s user 0m0.045s sys 0m0.039s
./xxx: real 0m0.085s user 0m0.044s sys 0m0.039s
./xxx: real 0m0.085s user 0m0.044s sys 0m0.038s
./xxx: real 0m0.084s user 0m0.044s sys 0m0.038s
./xxx: real 0m0.084s user 0m0.044s sys 0m0.039s
./xxx: real 0m0.083s user 0m0.044s sys 0m0.038s
./yyy: real 0m0.004s user 0m0.003s sys 0m0.001s
./yyy: real 0m0.004s user 0m0.003s sys 0m0.001s
./yyy: real 0m0.004s user 0m0.003s sys 0m0.001s
./yyy: real 0m0.004s user 0m0.003s sys 0m0.001s
./yyy: real 0m0.005s user 0m0.003s sys 0m0.001s
./yyy: real 0m0.005s user 0m0.003s sys 0m0.001s
./yyy: real 0m0.004s user 0m0.003s sys 0m0.001s
./yyy: real 0m0.004s user 0m0.003s sys 0m0.001s
./yyy: real 0m0.004s user 0m0.003s sys 0m0.001s
./yyy: real 0m0.004s user 0m0.003s sys 0m0.001s
./yyy: real 0m0.006s user 0m0.003s sys 0m0.002s
./zzz: real 0m0.004s user 0m0.003s sys 0m0.001s
./zzz: real 0m0.004s user 0m0.003s sys 0m0.001s
./zzz: real 0m0.004s user 0m0.003s sys 0m0.001s
./zzz: real 0m0.004s user 0m0.003s sys 0m0.001s
./zzz: real 0m0.004s user 0m0.003s sys 0m0.001s
./zzz: real 0m0.005s user 0m0.003s sys 0m0.002s
./zzz: real 0m0.004s user 0m0.003s sys 0m0.001s
./zzz: real 0m0.004s user 0m0.003s sys 0m0.001s
./zzz: real 0m0.004s user 0m0.003s sys 0m0.001s
./zzz: real 0m0.004s user 0m0.003s sys 0m0.001s
./zzz: real 0m0.005s user 0m0.003s sys 0m0.001s

显然,YMMV - 但似乎OCaml比C慢了很多,但如果给定文件中的数字足够小,那么启动和文件读取将主导处理时间。

C时间,特别是在较小的数字时,速度非常快,以至于它们并不那么可靠。

答案 1 :(得分:8)

0.05以下的时间可能是一个简单的噪音。重复主程序足够的时间实际上在C中获得大约1秒的执行时间。(我的意思是在程序本身的循环中重复它,而不是再次运行它)

您是否在启用优化的情况下编译代码?你有没有尝试减少分支机构的数量? (和比较)

if (i % 3 == 0) {
  if (i % 5 == 0) {
    printf("Hop\n");
    continue;
  }
  printf("Hoppity\n");
} else if (i % 5 == 0){
  printf("Hophop\n");
}

您是否尝试查看汇编程序输出?

同样,printf很慢。请尝试使用puts("Hop"),因为您还是不使用格式化。

答案 2 :(得分:2)

在这样一个小小的程序中,通常很难猜出为什么事情会以他们的方式运作。我想如果我这样做,我会像这样编写代码(暂时不进行错误检查):

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) { 
    static char buffer[20];   
    int limit, i;

    freopen(argv[1], "r", stdin);
 fgets(buffer, sizeof(buffer), stdin);
    limit = atoi(buffer);

    for (i=1; i<=limit; i++) {
        int div3=i%3==0;
        int div5=i%5==0;
        if (div3 && div5) 
            puts("Hop");
        else if (div3)
            puts("Hoppity");
        else if (div5)
            puts("HopHop");
    }
    return 0;
}

使用freopen可以避免创建另一个文件流,而只是将标准输入连接到指定的文件。不能保证它更快,但不管怎样它都不可能更慢。

同样,一个好的编译器可能会注意到i在整个循环体中是恒定的,并将两个余数运算分解出来,因此它只执行一次。在这里,我已经手动完成了这一点,这可能不会更快,但几乎肯定不会更慢。

使用puts代替printf非常相似 - 它可能不会更快,但几乎肯定不会更慢。使用printf的方式,它必须扫描整个字符串以查找'%',以防您要求转换,但由于puts不进行任何转换,它不必这样做。

有了这么小的程序,还有另一个因素可能会更加重要:puts通常会比printf小得多。您还没有说过如何进行计时,但如果它包含加载代码的时间,那么实际上很小的代码可能会比执行时间产生更大的差异。

答案 3 :(得分:1)

我很想知道在get_count()中花了多少时间。

我不确定它有多重要,但你读的是一个字符串,这意味着字符串不能大于20个字节,或10个字节(2 ^ 64 =大约20个字符长的小数number,或2 ^ 32 =大约10个字符长的十进制数),所以你不需要在get_count中使用while循环。此外,您可以在堆栈上分配file_text,而不是调用calloc - 但我想您仍然需要将其清零,否则找到长度并将最后一个字节设置为null。

file_length = lseek(fileptr, 0, SEEK_END);

答案 4 :(得分:1)

任何主要涉及打开文件并阅读文件的程序都受到打开文件和阅读文件的速度的限制。你在这里进行的C计算将需要1到百万分之一到千分之一的时间打开文件并阅读它。

我认为此网站很有用:http://norvig.com/21-days.html#answers