我需要在for循环中组合一些RDD[(LongWritable, Text)]
类型的RDD。出于某种原因,我必须将最终的RDD转换为HadoopRDD[LongWritable, Text]
,但我得到了例外:
java.lang.ClassCastException:org.apache.spark.rdd.UnionRDD无法强制转换为org.apache.spark.rdd.HadoopRDD
如果不涉及工会,转换似乎很好。
以下是代码:
var allRdd: RDD[(LongWritable, Text)] = null
for (file <- fileList)
{
val rd = sc.hadoopFile(file.toString(),
classOf[TextInputFormat],
classOf[LongWritable],
classOf[Text],
sc.defaultMinPartitions);
// this is fine
// val hRdd = rd.asInstanceOf[HadoopRDD[LongWritable, Text]]
if ( allRdd == null)
{
allRdd = rd
}
else
{
allRdd = allRdd.union(rd)
}
}
// this line throws the exception :
// java.lang.ClassCastException: org.apache.spark.rdd.UnionRDD
// cannot be cast to org.apache.spark.rdd.HadoopRDD
val allHadoopRdd = allRdd.asInstanceOf[HadoopRDD[LongWritable, Text]]
有人能让我知道可能出现的问题吗?还有其他方法可以合并RDD吗?感谢。