Question

我想知道为什么当我使用PairDStreamFunctions.reduceByKey时，scala编译器无法推断出我的函数参数的类型，这里是代码：

val ssc = new StreamingContext(conf, Seconds(10))
ssc.checkpoint(".checkpoint")
val lines = ssc.socketTextStream("localhost", 9999)
val words = lines.flatMap(_.split(" "))
val wordCounts = words
  .map((_, 1))
  .reduceByKey((x: Int, y: Int) => x + y, 4)  //here i must specify the type Int,and this format can't work : reduceByKey((x, y) => x + y, 4)

这里我必须指定我的函数参数的类型Int ，如 reduceByKey（（x：Int，y：Int）=＆gt; x + y，4）我使用 PairDStreamFunctions .reduceByKey，此格式无效： reduceByKey（（x，y）=＆gt; x + y，4）

另一方面，当我使用PairRDDFunctions.reduceByKey api时，它可以推断出类型，这里是代码：

val conf = new SparkConf()
val sc = new SparkContext(conf)
val rdd = sc.parallelize(List(
  "hi what"
  , "show you"
  , "matter how"
))
rdd.flatMap(_.split(" "))
  .map((_, 1))
  .reduceByKey((x, y) => x + y, 4)//in this code,scala compiler could infer the type of my function parameter (x,y) => x+y

当我使用 PairRDDFunctions .reduceByKey时， reduceByKey（（x，y）=＆gt; x + y，4）可以正常工作。 我真的不明白是什么让它与众不同？

Answer 1

这种情况正在发生，因为PairRDDFunctions方法只有一次def重载 reduceByKey(func: (V, V) ⇒ V, [SOMETHING])而PairDStreamFunctions有两个：

def reduceByKey(reduceFunc: (V, V) ⇒ V, numPartitions: Int)
def reduceByKey(reduceFunc: (V, V) ⇒ V, partitioner: Partitioner)

因此，尽管partitioner变体应该被抛弃作为一种可能性，但它仍然会引入并混淆编译器。您可以通过明确命名来看到它：

.reduceByKey((x, y) => x + y,partitioner = 4)

我不确定它在编译器定义中的位置，但它清楚地表现为上述原因。

当我编写PairDStreamFunctions.reduceByKey

1 个答案: