联盟rdd时火花1.6.3卡住了

时间:2016-11-24 12:07:51

标签: hadoop apache-spark

我想要联合两个RDD,一个来自RDBMS,一个来自hdfs。

当hdfs文件不存在时,它可以正常工作。当它存在时,并将两个RDD联合起来。 Spark会卡住。

日志:

16/11/24 20:01:29 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 200 blocks
16/11/24 20:01:29 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/11/24 20:01:29 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/11/24 20:01:29 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/11/24 20:01:29 INFO Executor: Finished task 2.0 in stage 5.0 (TID 409). 2074 bytes result sent to driver
16/11/24 20:01:29 INFO TaskSetManager: Finished task 2.0 in stage 5.0 (TID 409) in 76 ms on localhost (1/3)

停在这里。

代码:

val allUserAction = if (exists) {
  val textFile = sc.textFile(historyHDFSFilePath).map { line =>
  //val textFile = sc.textFile("file:///Volumes/disk02/Desktop/part-00000").map { line =>
    val values = line.split(",")
    if (values.length == 13) {
      (values(0).toLong, (values(1), values(2).toLong, values(3).toLong,
        values(4).toInt, values(5).toLong, values(6).toInt, values(7).toBoolean, false, values(8).toBoolean,
        values(9).toBoolean, values(10).toBoolean, values(11).toInt, values(12).toInt))
    } else {
      (0L, ("", 0L, 0L, 0, 0L, 0, false, false, false, false, false, 0, 0))
    }
  }
  textFile.union(registerUserSourceWithOp)
} else {
  registerUserSourceWithOp
}.cache()

allUserAction.collect().foreach(println)

如果我将文件从hdfs复制到本地计算机。并从中读取。它工作正常。 如何解决它。 THX〜

0 个答案:

没有答案