如何将参数传递给传递给mapPartitions的函数?

时间:2015-10-20 08:07:43

标签: scala apache-spark

我正在尝试使用mapPartitions函数而不是使用map,问题是我想传递Array作为参数,但mapPartitions不接受Array作为论据。如何将数组作为参数传递?

mapPartitions[U: ClassTag](
    f: Iterator[T] => Iterator[U], preservesPartitioning: Boolean = false)

1 个答案:

答案 0 :(得分:2)

目前尚不清楚你在问什么,所以我猜你有一个看起来或多或少的功能:

[Oct 20 15:44:50] ERROR[30428][C-00000000]: res_config_pgsql.c:169 _pgsql_exec: PostgreSQL RealTime: Query Failed because: ERROR:  invalid input syntax for type timestamp with time zone: "NULL"
LINE 1: UPDATE confinfo SET start_time = 'NULL' WHERE id = '01'
                                        ^
 (PGRES_FATAL_ERROR)

并且您想将其传递给def foo(iter: Iterator[T], xs: Array[V]): Iterator[U] = ???

您有三种选择:

  1. 使用匿名函数:

    mapPartitions
  2. 重写val xs: Array[V] = ??? val rdd: RDD[U] = ??? rdd.mapPartitions(iter => foo(iter, xs)) 以支持currying:

    foo
  3. 咖喱def foo(xs: Array[V])(iter: Iterator[T]): Iterator[U] = ??? // Rest as before rdd.mapPartitions(foo(xs)) 是这样的:

    foo