Question

我正在尝试使用Scala和Akka进行一些并行编程，这是我不熟悉的。我有一个非常简单的蒙特卡罗Pi应用程序（近似于一个圈中的pi），我用几种语言构建了它。然而，在Akka中构建的I版本的性能令我感到困惑。

我有一个用纯Scala编写的顺序版本，大约需要400ms才能完成。

与1名工人演员相比，Akka版本需要大约300-350ms，但随着时间的增加，演员的数量急剧增加。有4名演员，时间可以在500毫秒之间，一直到1200毫秒或更高。

工作者之间正在划分迭代次数，所以理想情况下，性能越多越好，目前情况会越来越差。

我的代码是

object MCpi{
  //Declare initial values
  val numWorkers = 2
  val numIterations = 10000000

  //Declare messages that will be sent to actors
  sealed trait PiMessage
  case object Calculate extends PiMessage
  case class Work(iterations: Int) extends PiMessage
  case class Result(value: Int) extends PiMessage
  case class PiApprox(pi: Double, duration: Double)

  //Main method
  def main(args: Array[String]): Unit = {
    val system = ActorSystem("MCpi_System") //Create Akka system
    val master = system.actorOf(Props(new MCpi_Master(numWorkers, numIterations))) //Create Master Actor
    println("Starting Master")

    master ! Calculate //Run calculation
  }
}

//Master
class MCpi_Master(numWorkers: Int, numIterations: Int) extends Actor{

  var pi: Double = _ // Store pi
  var quadSum: Int = _ //the total number of points inside the quadrant
  var numResults: Int = _ //number of results returned
  val startTime: Double = System.currentTimeMillis() //calculation start time

  //Create a group of worker actors
  val workerRouter = context.actorOf(
    Props[MCpi_Worker].withRouter(RoundRobinPool(numWorkers)), name = "workerRouter")
  val listener = context.actorOf(Props[MCpi_Listener], name = "listener")

  def receive = {
    //Tell workers to start the calculation
      //For each worker a message is sent with the number of iterations it is to perform,
      //iterations are split up between the number of workers.
    case Calculate => for(i <- 0 until numWorkers) workerRouter ! Work(numIterations / numWorkers);

      //Receive the results from the workers
        case Result(value) =>
            //Add up the total number of points in the circle from each worker
      quadSum += value
            //Total up the number of results which have been received, this should be 1 for each worker
      numResults += 1

      if(numResults == numWorkers) { //Once all results have been collected
          //Calculate pi
          pi = (4.0 * quadSum) / numIterations
          //Send the results to the listener to output
        listener ! PiApprox(pi, duration = System.currentTimeMillis - startTime)
        context.stop(self)
      }
  }
}
//Worker
class MCpi_Worker extends Actor {
  //Performs the calculation
  def calculatePi(iterations: Int): Int = {

    val r = scala.util.Random // Create random number generator
    var inQuadrant: Int = 0 //Store number of points within circle

    for(i <- 0 to iterations){
      //Generate random point
      val X = r.nextFloat()
      val Y = r.nextFloat()

      //Determine whether or not the point is within the circle
      if(((X * X) + (Y * Y)) < 1.0)
        inQuadrant += 1
    }
    inQuadrant //return the number of points within the circle
  }

  def receive = {
    //Starts the calculation then returns the result
    case Work(iterations) => sender ! Result(calculatePi(iterations))
  }
}

//Listener
class MCpi_Listener extends Actor{ //Recieves and prints the final result
  def receive = {
    case PiApprox(pi, duration) =>
        //Print the results
      println("\n\tPi approximation: \t\t%s\n\tCalculation time: \t%s".format(pi, duration))
        //Print to a CSV file
        val pw: FileWriter = new FileWriter("../../../..//Results/Scala_Results.csv", true)
        pw.append(duration.toString())
        pw.append("\n")
        pw.close()
      context.system.terminate()

  }
}

普通的Scala顺序版本是

object MCpi {
    def main(args: Array[String]): Unit = {
        //Define the number of iterations to perform
        val iterations = args(0).toInt;
        val resultsPath = args(1);

        //Get the current time
        val start = System.currentTimeMillis()


        // Create random number generator
        val r = scala.util.Random
        //Store number of points within circle
        var inQuadrant: Int = 0

        for(i <- 0 to iterations){
            //Generate random point
            val X = r.nextFloat()
            val Y = r.nextFloat()

            //Determine whether or not the point is within the circle
            if(((X * X) + (Y * Y)) < 1.0)
                inQuadrant += 1
        }
        //Calculate pi
        val pi = (4.0 * inQuadrant) / iterations
        //Get the total time
        val time = System.currentTimeMillis() - start
        //Output values
        println("Number of Iterations: " + iterations)
        println("Pi has been calculated as: " + pi)
        println("Total time taken: " + time + " (Milliseconds)")

        //Print to a CSV file
        val pw: FileWriter = new FileWriter(resultsPath + "/Scala_Results.csv", true)
        pw.append(time.toString())
        pw.append("\n")
        pw.close()
    }
}

对于为什么会发生这种情况或如何提高性能的任何建议都会非常受欢迎。

编辑：我要感谢你们所有人的回答，这是我在这个网站上的第一个问题，所有的答案都非常有用，我现在有很多东西要看：）

Answer 1

您正在使用的Random实例周围存在同步问题。

更具体地说，这一行

val r = scala.util.Random // Create random number generator

实际上不是“创建随机数生成器”，而是选择object方便地为您提供的单scala.util。这意味着所有线程都将共享它，并将围绕其种子进行同步（有关详细信息，请参阅java.util.Random.nextFloat的代码。）

只需将该行更改为

即可

val r = new scala.util.Random // Create random number generator

你应该获得一些并行化加速。正如评论中所述，加速将取决于您的体系结构等等，但至少它不会因强同步而严重偏差。

请注意，java.util会将System.nanoTime用作新创建的Random的种子，因此您无需担心随机问题。

Answer 2

我认为这是一个值得深入探讨的重要问题。使用带有一些系统开销的Akka Actor系统，我希望只有当规模足够大时才能看到性能增益。我用最少的代码更改测试了你的两个版本（非akka vs akka）。正如预期的那样，无论Akka与非Akka或使用的工人数量如何，都没有达到100万或1000万次点击。但是，在1亿次点击时，您可以看到一致的性能差异。

除了将总命中数扩大到1亿之外，我所做的唯一代码更改是用java.util.concurrent.ThreadLocalRandom替换scala.util.Random：

//val r = scala.util.Random // Create random number generator
def r = ThreadLocalRandom.current
...
  //Generate random point
  //val X = r.nextFloat()
  //val Y = r.nextFloat()
  val X = r.nextDouble(0.0, 1.0)
  val Y = r.nextDouble(0.0, 1.0)

这一切都是在配备2GHz四核CPU和8GB内存的旧MacBook Pro上完成的。以下是总计1亿次点击的测试结果：

非Akka app需要~1720 ms
有2名工人的Akka应用需要~770毫秒
有4名工人的Akka应用需要~430毫秒

以下单独测试...

<强>非阿卡