我想要联合两个RDD,一个来自RDBMS,一个来自hdfs。
当hdfs文件不存在时,它可以正常工作。当它存在时,并将两个RDD联合起来。 Spark会卡住。
日志:
16/11/24 20:01:29 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 200 blocks
16/11/24 20:01:29 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/11/24 20:01:29 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/11/24 20:01:29 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/11/24 20:01:29 INFO Executor: Finished task 2.0 in stage 5.0 (TID 409). 2074 bytes result sent to driver
16/11/24 20:01:29 INFO TaskSetManager: Finished task 2.0 in stage 5.0 (TID 409) in 76 ms on localhost (1/3)
停在这里。
代码:
val allUserAction = if (exists) {
val textFile = sc.textFile(historyHDFSFilePath).map { line =>
//val textFile = sc.textFile("file:///Volumes/disk02/Desktop/part-00000").map { line =>
val values = line.split(",")
if (values.length == 13) {
(values(0).toLong, (values(1), values(2).toLong, values(3).toLong,
values(4).toInt, values(5).toLong, values(6).toInt, values(7).toBoolean, false, values(8).toBoolean,
values(9).toBoolean, values(10).toBoolean, values(11).toInt, values(12).toInt))
} else {
(0L, ("", 0L, 0L, 0, 0L, 0, false, false, false, false, false, 0, 0))
}
}
textFile.union(registerUserSourceWithOp)
} else {
registerUserSourceWithOp
}.cache()
allUserAction.collect().foreach(println)
如果我将文件从hdfs复制到本地计算机。并从中读取。它工作正常。 如何解决它。 THX〜