Spark UnionRDD无法转换为HadoopRDD

时间:2016-04-05 21:43:00

标签: scala hadoop apache-spark rdd

我需要在for循环中组合一些RDD[(LongWritable, Text)]类型的RDD。出于某种原因,我必须将最终的RDD转换为HadoopRDD[LongWritable, Text],但我得到了例外:

  

java.lang.ClassCastException:org.apache.spark.rdd.UnionRDD无法强制转换为org.apache.spark.rdd.HadoopRDD

如果不涉及工会,转换似乎很好。

以下是代码:

 var allRdd: RDD[(LongWritable, Text)] = null

  for (file <- fileList)
  {
      val rd = sc.hadoopFile(file.toString(), 
                           classOf[TextInputFormat], 
                           classOf[LongWritable], 
                           classOf[Text], 
                           sc.defaultMinPartitions);

      // this is fine
      // val hRdd = rd.asInstanceOf[HadoopRDD[LongWritable, Text]] 

      if ( allRdd == null)
      {
         allRdd = rd 
      }
      else 
      {
         allRdd = allRdd.union(rd)
      }
  }

   // this line throws the exception :
   // java.lang.ClassCastException: org.apache.spark.rdd.UnionRDD 
   //    cannot be cast to org.apache.spark.rdd.HadoopRDD
   val allHadoopRdd = allRdd.asInstanceOf[HadoopRDD[LongWritable, Text]]

有人能让我知道可能出现的问题吗?还有其他方法可以合并RDD吗?感谢。

0 个答案:

没有答案