我正在使用Scala和Apache Spark开发国际象棋引擎(我需要强调的是,我的理智不是这个问题的主题)。我的问题是Negamax算法本质上是递归的,当我尝试天真的方法时:
class NegaMaxSparc(@transient val sc: SparkContext) extends Serializable {
val movesOrdering = new Ordering[Tuple2[Move, Double]]() {
override def compare(x: (Move, Double), y: (Move, Double)): Int =
Ordering[Double].compare(x._2, y._2)
}
def negaMaxSparkHelper(game: Game, color: PieceColor, depth: Int, previousMovesPar: RDD[Move]): (Move, Double) = {
val board = game.board
if (depth == 0) {
(null, NegaMax.evaluateDefault(game, color))
} else {
val moves = board.possibleMovesForColor(color)
val movesPar = previousMovesPar.context.parallelize(moves)
val moveMappingFunc = (m: Move) => { negaMaxSparkHelper(new Game(board.boardByMakingMove(m), color.oppositeColor, null), color.oppositeColor, depth - 1, movesPar) }
val movesWithScorePar = movesPar.map(moveMappingFunc)
val move = movesWithScorePar.min()(movesOrdering)
(move._1, -move._2)
}
}
def negaMaxSpark(game: Game, color: PieceColor, depth: Int): (Move, Double) = {
if (depth == 0) {
(null, NegaMax.evaluateDefault(game, color))
} else {
val movesPar = sc.parallelize(new Array[Move](0))
negaMaxSparkHelper(game, color, depth, movesPar)
}
}
}
class NegaMaxSparkBot(val maxDepth: Int, sc: SparkContext) extends Bot {
def nextMove(game: Game): Move = {
val nms = new NegaMaxSparc(sc)
nms.negaMaxSpark(game, game.colorToMove, maxDepth)._1
}
}
我明白了:
org.apache.spark.SparkException: RDD transformations and actions can only be invoked by the driver, not inside of other transformations; for example, rdd1.map(x => rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the rdd1.map transformation. For more information, see SPARK-5063.
问题是:这个算法可以使用Spark递归实现吗?如果没有,那么解决该问题的适当Spark方式是什么?
答案 0 :(得分:2)
这是一个在实施方面有意义的限制,但使用起来可能很麻烦。
您可以尝试将递归拉出到顶级,只需在创建和运行RDD的“驱动程序”代码中?类似的东西:
def step(rdd: Rdd[Move], limit: Int) =
if(0 == limit) rdd
else {
val newRdd = rdd.flatMap(...)
step(newRdd, limit - 1)
}
或者,通过手动显式管理“堆栈”,总是可以将递归转换为迭代(尽管它可能会导致更繁琐的代码)。
答案 1 :(得分:2)
只有驱动程序才能在RDD上启动计算。原因在于,即使RDD“感觉”像常规数据集合,在场景后面它们仍然是分布式集合,因此在它们上启动操作需要协调所有远程从站上的任务执行,这大多数时间都会引发我们的隐藏。
因此,从从属设备中递归,即直接从从设备动态启动新的分布式任务是不可能的:只有驱动器才能处理这种协调。
这是一个简化问题的可能替代方案(如果我能正确处理的话)。我们的想法是连续构建Moves
的实例,每个实例代表从初始状态开始的Move
的完整序列。
Moves
的每个实例都可以将自身转换为一组Moves
,每个Move
对应于同一序列Move
加上一个可能的下一个Moves
。
从那里开始,驱动程序必须依次平面映射n
,并且生成的RDD [Moves]将为我们并行执行所有操作。
该方法的缺点是所有深度级别都保持同步,即我们必须计算级别RDD[Moves]
的所有移动(即级别n
的{{1}})下一个。
下面的代码没有经过测试,它可能有缺陷,甚至没有编译,但希望它提供了如何解决问题的想法。
/* one modification to the board */
case class Move(from: String, to: String)
case class PieceColor(color: String)
/* state of the game */
case class Board {
// TODO
def possibleMovesForColor(color: PieceColor): Seq[Move] =
Move("here", "there") :: Move("there", "over there") :: Move("there", "here") :: Nil
// TODO: compute a new instance of board here, based on current + this move
def update(move: Move): Board = new Board
}
/** Solution, i.e. a sequence of moves*/
case class Moves(moves: Seq[Move], game: Board, color: PieceColor) {
lazy val score = NegaMax.evaluateDefault(game, color)
/** @return all valid next Moves */
def nextPossibleMoves: Seq[Moves] =
board.possibleMovesForColor(color).map {
nextMove =>
play.copy(moves = nextMove :: play.moves,
game = play.game.update(nextMove)
}
}
/** Driver code: negaMax: looks for the best next move from a give game state */
def negaMax(sc: SparkContext, game: Board, color: PieceColor, maxDepth: Int):Moves = {
val initialSolution = Moves(Seq[moves].empty, game, color)
val allPlays: rdd[Moves] =
(1 to maxDepth).foldLeft (sc.parallelize(Seq(initialSolution))) {
rdd => rdd.flatMap(_.nextPossibleMoves)
}
allPlays.reduce { case (m1, m2) => if (m1.score < m2.score) m1 else m2}
}