我尝试使用以下方法将RDD [String]写入HDFS(在spark-shell中):
output.saveAsTextFile("hdfs://localhost:9000/datasets/result")
然而,它只是挂起 - 而且这项工作甚至没有出现在网页用户界面中。我必须杀死SparkSubmit进程。
我使用以下方法从HDFS读取数据:
val input = sc.textFile("hdfs://localhost:9000/datasets/data.csv")
我可以output.collect
成功,写入本地文件可以正常工作。
我使用Spark 1.4和Hadoop 2.6。一切都在本地机器上运行。
有什么想法吗?
评论让我意识到,我应该打开DEBUG级日志。下面的初始日志提取。有关于连接被关闭的事情。但是,我在短脚本开头读取了HDFS的数据,所以我很困惑。
必须与本地路由有关。我用127.0.0.1替换了localhost,它现在正在工作。
15/06/30 17:04:42 DEBUG ClosureCleaner: +++ Cleaning closure <function1> (org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1$$anonfun$30}) +++
15/06/30 17:04:42 DEBUG ClosureCleaner: + declared fields: 1
15/06/30 17:04:42 DEBUG ClosureCleaner: public static final long org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1$$anonfun$30.serialVersionUID
15/06/30 17:04:42 DEBUG ClosureCleaner: + declared methods: 2
15/06/30 17:04:42 DEBUG ClosureCleaner: public final java.lang.Object org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1$$anonfun$30.apply(java.lang.Object)
15/06/30 17:04:42 DEBUG ClosureCleaner: public final scala.collection.Iterator org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1$$anonfun$30.apply(scala.collection.Iterator)
15/06/30 17:04:42 DEBUG ClosureCleaner: + inner classes: 1
15/06/30 17:04:42 DEBUG ClosureCleaner: org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1$$anonfun$30$$anonfun$apply$49
15/06/30 17:04:42 DEBUG ClosureCleaner: + outer classes: 0
15/06/30 17:04:42 DEBUG ClosureCleaner: + outer objects: 0
15/06/30 17:04:42 DEBUG ClosureCleaner: + populating accessed fields because this is the starting closure
15/06/30 17:04:42 DEBUG ClosureCleaner: + fields accessed by starting closure: 0
15/06/30 17:04:42 DEBUG ClosureCleaner: + there are no enclosing objects!
15/06/30 17:04:42 DEBUG ClosureCleaner: +++ closure <function1> (org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1$$anonfun$30) is now cleaned +++
15/06/30 17:04:42 DEBUG BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
15/06/30 17:04:42 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit = false
15/06/30 17:04:42 DEBUG BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
15/06/30 17:04:42 DEBUG BlockReaderLocal: dfs.domain.socket.path =
15/06/30 17:04:42 DEBUG DFSClient: No KeyProvider found.
15/06/30 17:04:43 DEBUG Client: IPC Client (832019786) connection to localhost/127.0.0.1:9000 from user: closed
15/06/30 17:04:43 DEBUG Client: IPC Client (832019786) connection to localhost/127.0.0.1:9000 from user: stopped, remaining connections 0
15/06/30 17:04:48 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received message AkkaMessage(Heartbeat(driver,[Lscala.Tuple2;@4eb1e256,BlockManagerId(driver, localhost, 51349)),true) from Actor[akka://sparkDriver/temp/$L]
15/06/30 17:04:48 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message: AkkaMessage(Heartbeat(driver,[Lscala.Tuple2;@4eb1e256,BlockManagerId(driver, localhost, 51349)),true)
15/06/30 17:04:48 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled message (0.290877 ms) AkkaMessage(Heartbeat(driver,[Lscala.Tuple2;@4eb1e256,BlockManagerId(driver, localhost, 51349)),true) from Actor[akka://sparkDriver/temp/$L]
15/06/30 17:04:48 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received message AkkaMessage(BlockManagerHeartbeat(BlockManagerId(driver, localhost, 51349)),true) from Actor[akka://sparkDriver/temp/$M]
15/06/30 17:04:48 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message: AkkaMessage(BlockManagerHeartbeat(BlockManagerId(driver, localhost, 51349)),true)
15/06/30 17:04:48 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled message (0.257991 ms) AkkaMessage(BlockManagerHeartbeat(BlockManagerId(driver, localhost, 51349)),true) from Actor[akka://sparkDriver/temp/$M]
15/06/30 17:04:57 DEBUG RetryUtils: multipleLinearRandomRetry = null
15/06/30 17:04:57 DEBUG Client: getting client out of cache: org.apache.hadoop.ipc.Client@7d1fc150
15/06/30 17:04:57 DEBUG DataTransferSaslUtil: DataTransferProtocol not using SaslPropertiesResolver, no QOP found in configuration for dfs.data.transfer.protection
15/06/30 17:04:57 DEBUG PairRDDFunctions: Saving as hadoop file of type (NullWritable, Text)
15/06/30 17:04:57 DEBUG Client: The ping interval is 60000 ms.
15/06/30 17:04:57 DEBUG Client: Connecting to localhost/81.200.64.50:9000
15/06/30 17:04:58 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received message AkkaMessage(Heartbeat(driver,[Lscala.Tuple2;@7aba4180,BlockManagerId(driver, localhost, 51349)),true) from Actor[akka://sparkDriver/temp/$N]
15/06/30 17:04:58 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message: AkkaMessage(Heartbeat(driver,[Lscala.Tuple2;@7aba4180,BlockManagerId(driver, localhost, 51349)),true)
15/06/30 17:04:58 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled message (0.277965 ms) AkkaMessage(Heartbeat(driver,[Lscala.Tuple2;@7aba4180,BlockManagerId(driver, localhost, 51349)),true) from Actor[akka://sparkDriver/temp/$N]
15/06/30 17:04:58 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received message AkkaMessage(BlockManagerHeartbeat(BlockManagerId(driver, localhost, 51349)),true) from Actor[akka://sparkDriver/temp/$O]
15/06/30 17:04:58 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message: AkkaMessage(BlockManagerHeartbeat(BlockManagerId(driver, localhost, 51349)),true)
15/06/30 17:04:58 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled message (0.258037 ms) AkkaMessage(BlockManagerHeartbeat(BlockManagerId(driver, localhost, 51349)),true) from Actor[akka://sparkDriver/temp/$O]