Question

首先，我有一个salesList: List[Sale]，为了获得列表中我使用过的lastOption中最后一次销售的ID：

val lastSaleId: Option[Any] = salesList.lastOption.map(_.saleId)

但是现在我已经使用List[Sale]修改了一种方法来使用salesListRdd: List[RDD[Sale]]。因此，我更改了获取上次销售ID的方式：

  val lastSaleId: Option[Any] = SparkContext
    .union(salesListRdd)
    .collect().toList
    .lastOption.map(_.saleId)

我不确定这是最好的方法。因为在这里我仍将RDD收集到一个List中，该列表将其带到驱动程序节点，这可能会导致驱动程序用尽内存。

是否有一种方法可以从RDD获取最后一次销售的ID，并保留记录的初始顺序？不是任何排序方式，而是Sale对象最初存储在列表中的方式？

Answer 1

至少有两个有效的解决方案。您可以将top与zipWithIndex一起使用：

def lastValue[T](rdd: RDD[T]): Option[T] = {
  rdd.zipWithUniqueId.map(_.swap).top(1)(Ordering[Long].on(_._1)).headOption.map(_._2)
}

或带有自定义键的top：

 def lastValue[T](rdd: RDD[T]): Option[T] = {
   rdd.mapPartitionsWithIndex(
     (i, iter) => iter.zipWithIndex.map {  case (x, j) => ((i, j), x) }
   ).top(1)(Ordering[(Int, Long)].on(_._1)).headOption.map(_._2)
 }

前者需要对zipWithIndex采取其他措施，而后者则不需要。

在使用之前，请务必了解限制。 Quoting the docs：

请注意，某些RDD（例如由groupBy（）返回的RDD）不能保证分区中元素的顺序。因此，不能保证分配给每个元素的唯一ID，并且如果重新评估RDD甚至可能会更改。如果需要固定顺序来保证相同的索引分配，则应使用sortByKey（）对RDD排序或将其保存到文件中。

尤其是，根据确切的输入，Union可能根本不会保留输入顺序。

Answer 2

您可以使用const bot = new Discord.Client({disableEveryone: true}); const cmdHandler = ["commands","automation"]; const fileSys = require("fs"); cmdHandler.forEach((v, y) => { bot.v = new Discord.Collection(); console.log(v); fileSys.readdir(`./${v}/`, (error, file) => { if(error) console.log(error); let jsfile = file.filter(f => f.split (".").pop() === "js") if(jsfile.length <= 0){ console.log("Couldn't find the commands."); return } jsfile.forEach((f, i) => { let props = require(`./${v}/${f}`); console.log(`${f} loaded.`); bot.v.set(props.help.name, props); }); }); }); let fullCmd = msg.content.substr(prefixlen); let splitCmd = fullCmd.split(" "); let mainCmd = splitCmd[0]; let args = splitCmd.slice(1); // Set variable for directory content called from ./commands let commandFile = bot.commands.get(mainCmd); // execute "run" section of command if(commandFile){ commandFile.run(bot,msg,args);并按其对zipWithIndex进行排序，以使最后一条记录位于顶部，然后接受（1）：

descending

解决方案是从这里获取的：http://www.swi.com/spark-rdd-getting-bottom-records/ 但是，它效率极低，因为它会进行很多分区改组。

Scala-如何从RDD中选择最后一个元素？

2 个答案: