Question

我构建了一个spark streaming应用程序。这个应用程序从socket读取数据，然后计算，然后将结果写入hdfs。但是应用程序在A hadoop集群中运行，B hadoop集群中的hdfs。以下是我的代码：

if (args.length < 2) {
  System.out.println("Usage: StreamingWriteHdfs hostname port")
  System.exit(-1)
}

val conf = new SparkConf()
conf.setAppName("StreamingWriteHdfs")

val ssc = new StreamingContext(conf, Durations.seconds(10))
ssc.checkpoint("/tmp")

val hostname: String = args(0)
val port :Int = Integer.parseInt(args(1))

val lines = ssc.socketTextStream(hostname, port)
val words = lines.flatMap(_.split(" "))
val pairs = words.map(word => (word, 1))
val wordCounts = pairs.reduceByKey(_ + _)

wordCounts.print()

//TODO write to hdfs
wordCounts.saveAsHadoopFiles("hdfs://plumber/tmp/test/streaming",
  "out",
  classOf[Text],
  classOf[IntWritable],
  classOf[TextOutputFormat[Text, IntWritable]])

ssc.start()
ssc.awaitTermination()

在群集中运行此appplication时，请获取此execption：

java.lang.IllegalArgumentException: java.net.UnknownHostException:plumber
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: plumber

B hadoop集群的 fs.defaultFS 是 hdfs：//管道工。

有些人可以帮帮我！ thxs。

Answer 1

我认为您需要修改主机名，如

＆＃34; HDFS：//管道工：8020 / TMP /测试/流＆＃34 ;.

[SPARK]：java.lang.IllegalArgumentException：java.net.UnknownHostException：管道工

1 个答案: