Question

在Spark中，您可以为mapPartitions使用用户定义的函数。现在我的问题是如何将论证传递给它。例如，目前我有类似的东西，使用rdd.mapPartitions(userdefinedFunc)调用。

def userdefinedFunc(iter: Iterator[(Long, Array[SAMRecord])]) : Iterator[(Long, Long)] = 
{
  val res = scala.collection.mutable.ArrayBuffer.empty[(Long, Long)]

  // Code here

  res.iterator
}

但是，我还想要一个常量作为该用户定义函数的参数，例如它看起来如下。

def userdefinedFunc(iter: Iterator[(Long, Array[SAMRecord])], someConstant: Long) : 
 Iterator[(Long, Long)] = 
{
  val res = scala.collection.mutable.ArrayBuffer.empty[(Long, Long)]

  // Code here

  res.iterator
}

现在如何使用mapPartitions调用该函数。如果我只使用rdd.mapPartitions(userdefinedFunc(someConstant))，我会收到错误。

Answer 1

使用currying功能，如：

def userdefinedFunc(someConstant: Long)(iter: Iterator[(Long, Array[SAMRecord])]): Iterator[(Long, Long)]

然后userdefinedFunc(someConstant)将是一个类型为(iter: Iterator[(Long, Array[SAMRecord])]) => Iterator[(Long, Long)]的函数，您可以将其传递给mapPartitions。

如何将参数传递给Spark中mapPartitions的用户定义函数？

1 个答案: