我构建了一个spark streaming应用程序。这个应用程序从socket读取数据,然后计算,然后将结果写入hdfs。但是应用程序在A hadoop集群中运行,B hadoop集群中的hdfs。以下是我的代码:
if (args.length < 2) {
System.out.println("Usage: StreamingWriteHdfs hostname port")
System.exit(-1)
}
val conf = new SparkConf()
conf.setAppName("StreamingWriteHdfs")
val ssc = new StreamingContext(conf, Durations.seconds(10))
ssc.checkpoint("/tmp")
val hostname: String = args(0)
val port :Int = Integer.parseInt(args(1))
val lines = ssc.socketTextStream(hostname, port)
val words = lines.flatMap(_.split(" "))
val pairs = words.map(word => (word, 1))
val wordCounts = pairs.reduceByKey(_ + _)
wordCounts.print()
//TODO write to hdfs
wordCounts.saveAsHadoopFiles("hdfs://plumber/tmp/test/streaming",
"out",
classOf[Text],
classOf[IntWritable],
classOf[TextOutputFormat[Text, IntWritable]])
ssc.start()
ssc.awaitTermination()
在群集中运行此appplication时,请获取此execption:
java.lang.IllegalArgumentException: java.net.UnknownHostException:plumber
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: plumber
B hadoop集群的 fs.defaultFS 是 hdfs://管道工。
有些人可以帮帮我! thxs。答案 0 :(得分:0)
我认为您需要修改主机名,如
&#34; HDFS://管道工:8020 / TMP /测试/流&#34 ;.