Question

对于我现在正在做的其中一个项目，我需要查看不同concurrent enabled编程语言的性能（等等）。

目前我正在考虑比较stackless python和C++ PThreads，因此重点是这两种语言，但其他语言可能会在稍后进行测试。当然，比较必须尽可能具有代表性和准确性，所以我的第一个想法是开始寻找一些标准的 并发/多线程基准测试问题 ，唉我不能找到任何体面的或标准的，测试/问题/基准。

所以我的问题如下：你是否有一个好的，简单的或快速的问题的建议来测试编程语言的性能（以及揭示它在过程中的优点和缺点））？

Answer 1

当然，您应该测试硬件和编译器而不是并发性能的语言吗？

我会从一种语言的角度来看待它在并发性方面是多么的简单和高效，以及它使程序员“制造”锁定错误的程度。

编辑：根据过去作为设计并行算法的研究人员的经验，我认为在大多数情况下你会发现并发性能在很大程度上取决于算法的并行化方式，以及它如何针对底层硬件。

此外，基准是众所周知的不平等;在并行环境中更是如此。例如，“压缩”非常大的矩阵的基准测试适用于矢量流水线处理器，而并行排序可能更适合更通用的多核CPU。

这些可能很有用：

Parallel Benchmarks

NAS Parallel Benchmarks

Answer 2

嗯，有一些经典，但不同的测试强调不同的功能。一些分布式系统可能更健壮，具有更高效的消息传递等。更高的消息开销可能损害可伸缩性，因为扩展到更多机器的正常方式是发送大量小消息。你可以尝试的一些经典问题是Eratosthenes的分布式筛子或实施不良的斐波纳契序列计算器（即计算系列中的第8个数字，第7个机器旋转，第6个机器旋转）。几乎任何分而治之的算法都可以同时完成。你也可以同时实施康威的生命游戏或热传递。请注意，所有这些算法都有不同的焦点，因此您可能无法让一个分布式系统在所有这些算法中发挥最佳性能。

我认为最容易实现的最简单的是斐波纳契计算器，但它过于强调创建线程而在这些线程之间的通信太少。

Answer 3

Surely you should be testing hardware and compilers rather than a language for concurrency performance?

不，硬件和编译器与我的测试目的无关。我只是在寻找一些可以测试用一种语言编写的代码与另一种语言代码竞争的好问题。我正在测试特定语言中可用的构造以进行并发编程。其中一个标准是绩效（按时间衡量）。

我正在寻找的其他一些测试标准是：

easy 如何编写正确的代码;因为我们都知道并发编程比编写单线程程序更难。
用于并发编程的技术是什么：事件驱动，基于actor，消息解析，......
程序员自己必须编写多少代码以及自动完成多少代码：这也可以用给定的基准测试问题进行测试
什么是抽象级别以及转换回机器代码时涉及多少开销

实际上，我不是在寻找性能唯一且最好的参数（这确实会把我送到硬件和编译器而不是语言本身），我实际上在寻找从程序员的角度来看，检查什么语言最适合什么样的问题，它的弱点和优势是什么等等......

请记住，这只是一个小项目，因此测试也要保持很小。（因此，对一切进行严格的测试是不可行的）

Answer 4

我决定使用Mandelbrot set（escape time algorithm更精确）来对不同语言进行基准测试。
它非常适合我，因为原始算法可以轻松实现，并且从中创建多线程变体并不是那么多。

下面是我目前的代码。它仍然是一个单线程变体，但只要我对结果感到满意，我就会立即更新它。

#include <cstdlib> //for atoi
#include <iostream>
#include <iomanip> //for setw and setfill
#include <vector>


int DoThread(const double x, const double y, int maxiter) {
    double curX,curY,xSquare,ySquare;
    int i;

    curX = x + x*x - y*y;
    curY = y + x*y + x*y;
    ySquare = curY*curY;
    xSquare = curX*curX;

    for (i=0; i<maxiter && ySquare + xSquare < 4;i++) {
      ySquare = curY*curY;
      xSquare = curX*curX;
      curY = y + curX*curY + curX*curY;
      curX = x - ySquare + xSquare;
    }
    return i;
}

void SingleThreaded(int horizPixels, int vertPixels, int maxiter, std::vector<std::vector<int> >&  result) {
    for(int x = horizPixels; x > 0; x--) {
        for(int y = vertPixels; y > 0; y--) {
            //3.0 -> so we always have -1.5 -> 1.5 as the window; (x - (horizPixels / 2) will go from -horizPixels/2 to +horizPixels/2
            result[x-1][y-1] = DoThread((3.0 / horizPixels) * (x - (horizPixels / 2)),(3.0 / vertPixels) * (y - (vertPixels / 2)),maxiter);
        }
    }
}

int main(int argc, char* argv[]) {
    //first arg = length along horizontal axis
    int horizPixels = atoi(argv[1]);

    //second arg = length along vertical axis
    int vertPixels = atoi(argv[2]);

    //third arg = iterations
    int maxiter = atoi(argv[3]);

    //fourth arg = threads
    int threadCount = atoi(argv[4]);

    std::vector<std::vector<int> > result(horizPixels, std::vector<int>(vertPixels,0)); //create and init 2-dimensional vector
    SingleThreaded(horizPixels, vertPixels, maxiter, result);

    //TODO: remove these lines
    for(int y = 0; y < vertPixels; y++) {
      for(int x = 0; x < horizPixels; x++) {
            std::cout << std::setw(2) << std::setfill('0') << std::hex << result[x][y] << " ";
        }
        std::cout << std::endl;
    }
}

我在Linux下使用gcc测试过它，但我确信它也可以在其他编译器/操作系统下运行。要使其工作，您必须输入一些命令行参数，如下所示：

mandelbrot 106 500 255 1

第一个参数是宽度（x轴）
第二个参数是高度（y轴）
第三个参数是最大迭代次数（颜色数）
最后一个是线程数（但当前没有使用）

在我的分辨率上，上面的例子给了我一个很好的ASCII艺术表示Mandelbrot集。但是用不同的参数为自己尝试（第一个将是最重要的一个，因为它将是宽度）

Answer 5

下面你可以找到我一起攻击的代码来测试pthreads的多线程性能。我没有清理它，也没有进行任何优化;所以代码有点 raw 。

将计算出的mandelbrot设置保存为位图的代码不是我的，你可以找到它here

#include <cstdlib> //for atoi
#include <iostream>
#include <iomanip> //for setw and setfill
#include <vector>

#include "bitmap_Image.h" //for saving the mandelbrot as a bmp

#include <pthread.h>

pthread_mutex_t mutexCounter;
int sharedCounter(0);
int percent(0);

int horizPixels(0);
int vertPixels(0);
int maxiter(0);

//doesn't need to be locked
std::vector<std::vector<int> > result; //create 2 dimensional vector

void *DoThread(void *null) {
    double curX,curY,xSquare,ySquare,x,y;
    int i, intx, inty, counter;
    counter = 0;

    do {
        counter++;
        pthread_mutex_lock (&mutexCounter); //lock
            intx = int((sharedCounter / vertPixels) + 0.5);
            inty = sharedCounter % vertPixels;
            sharedCounter++;
        pthread_mutex_unlock (&mutexCounter); //unlock

        //exit thread when finished
        if (intx >= horizPixels) {
            std::cout << "exited thread - I did " << counter << " calculations" << std::endl;
            pthread_exit((void*) 0);
        }

        //set x and y to the correct value now -> in the range like singlethread
        x = (3.0 / horizPixels) * (intx - (horizPixels / 1.5));
        y = (3.0 / vertPixels) * (inty - (vertPixels / 2));

        curX = x + x*x - y*y;
        curY = y + x*y + x*y;
        ySquare = curY*curY;
        xSquare = curX*curX;

        for (i=0; i<maxiter && ySquare + xSquare < 4;i++){
          ySquare = curY*curY;
          xSquare = curX*curX;
          curY = y + curX*curY + curX*curY;
          curX = x - ySquare + xSquare;
        }
        result[intx][inty] = i;
     } while (true);
}

int DoSingleThread(const double x, const double y) {
    double curX,curY,xSquare,ySquare;
    int i;

    curX = x + x*x - y*y;
    curY = y + x*y + x*y;
    ySquare = curY*curY;
    xSquare = curX*curX;

    for (i=0; i<maxiter && ySquare + xSquare < 4;i++){
      ySquare = curY*curY;
      xSquare = curX*curX;
      curY = y + curX*curY + curX*curY;
      curX = x - ySquare + xSquare;
    }
    return i;

}

void SingleThreaded(std::vector<std::vector<int> >&  result) {
    for(int x = horizPixels - 1; x != -1; x--) {
        for(int y = vertPixels - 1; y != -1; y--) {
            //3.0 -> so we always have -1.5 -> 1.5 as the window; (x - (horizPixels / 2) will go from -horizPixels/2 to +horizPixels/2
            result[x][y] = DoSingleThread((3.0 / horizPixels) * (x - (horizPixels / 1.5)),(3.0 / vertPixels) * (y - (vertPixels / 2)));
        }
    }
}

void MultiThreaded(int threadCount, std::vector<std::vector<int> >&  result) {
    /* Initialize and set thread detached attribute */
    pthread_t thread[threadCount];
    pthread_attr_t attr;
    pthread_attr_init(&attr);
    pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);


    for (int i = 0; i < threadCount - 1; i++) {
        pthread_create(&thread[i], &attr, DoThread, NULL);
    }
    std::cout << "all threads created" << std::endl;

    for(int i = 0; i < threadCount - 1; i++) {
        pthread_join(thread[i], NULL);
    }
    std::cout << "all threads joined" << std::endl;
}

int main(int argc, char* argv[]) {
    //first arg = length along horizontal axis
    horizPixels = atoi(argv[1]);

    //second arg = length along vertical axis
    vertPixels = atoi(argv[2]);

    //third arg = iterations
    maxiter = atoi(argv[3]);

    //fourth arg = threads
    int threadCount = atoi(argv[4]);

    result = std::vector<std::vector<int> >(horizPixels, std::vector<int>(vertPixels,21)); // init 2-dimensional vector
    if (threadCount <= 1) {
        SingleThreaded(result);
    } else {
        MultiThreaded(threadCount, result);
    }


    //TODO: remove these lines
    bitmapImage image(horizPixels, vertPixels);
    for(int y = 0; y < vertPixels; y++) {
      for(int x = 0; x < horizPixels; x++) {
            image.setPixelRGB(x,y,16777216*result[x][y]/maxiter % 256, 65536*result[x][y]/maxiter % 256, 256*result[x][y]/maxiter % 256);
            //std::cout << std::setw(2) << std::setfill('0') << std::hex << result[x][y] << " ";
        }
        std::cout << std::endl;
    }

    image.saveToBitmapFile("~/Desktop/test.bmp",32);
}

使用具有以下参数的程序可以获得良好的结果：

mandelbrot 5120 3840 256 3

这样你就可以获得5 * 1024宽的图像; 5 * 768高，256色（唉，你只会得到1或2）和3个线程（1个主线程除了创建工作线程和2个工作线程之外不做任何工作）

Answer 6

自2008年9月基准测试游戏转向四核机器以来，许多不同编程语言的程序都被重新编写以利用四核 - for example, the first 10 mandelbrot programs。

用于测试并发性的基准测试问题

6 个答案: