我正在尝试从我的hdfs中读取数据,该位置也被提及。但是我没有得到数据,因为它显示了一些ConnectionException。
我还要附加日志文件。 hadoop的端口号是多少?我们应该追踪50070吗?
import org.apache.spark.SparkContext;
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import java.io.PrintWriter;
import java.net.URI;
object random {
def main(args :Array[String]) :Unit=
{
System.setProperty("hadoop.home.dir", "D:\\Softwares\\Hadoop")
val conf=new SparkConf().setMaster("local").setAppName("Hello");
val sc=new SparkContext(conf);
val hdfs = FileSystem.get(new URI("hdfs://104.211.213.47:50070/"), new Configuration())
val path = new Path("/user/m1047068/retail/logerrors.txt")
val stream = hdfs.open(path)
def readLines = Stream.cons(stream.readLine, Stream.continually( stream.readLine))
//This example checks line for null and prints every existing line consequentally
readLines.takeWhile(_ != null).foreach(line => println(line))
}
}
--------------------------------------------------------------------------------
这是我得到的日志文件。我不知道该异常,因为我是这个Spark字段的新手。
2018-09-17 14:50:51 INFO SparkContext:54 - Running Spark version 2.3.0
2018-09-17 14:50:51 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-09-17 14:50:51 INFO SparkContext:54 - Submitted application: Hello
2018-09-17 14:50:51 INFO SecurityManager:54 - Changing view acls to: M1047068
2018-09-17 14:50:51 INFO SecurityManager:54 - Changing modify acls to: M1047068
2018-09-17 14:50:51 INFO SecurityManager:54 - Changing view acls groups to:
2018-09-17 14:50:51 INFO SecurityManager:54 - Changing modify acls groups to:
2018-09-17 14:50:51 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(M1047068); groups with view permissions: Set(); users with modify permissions: Set(M1047068); groups with modify permissions: Set()
2018-09-17 14:50:52 INFO Utils:54 - Successfully started service 'sparkDriver' on port 51772.
2018-09-17 14:50:52 INFO SparkEnv:54 - Registering MapOutputTracker
2018-09-17 14:50:52 INFO SparkEnv:54 - Registering BlockManagerMaster
2018-09-17 14:50:52 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2018-09-17 14:50:52 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2018-09-17 14:50:52 INFO DiskBlockManager:54 - Created local directory at C:\Users\M1047068\AppData\Local\Temp\blockmgr-682d85a7-831e-4178-84de-5ade348a45f4
2018-09-17 14:50:52 INFO MemoryStore:54 - MemoryStore started with capacity 896.4 MB
2018-09-17 14:50:52 INFO SparkEnv:54 - Registering OutputCommitCoordinator
2018-09-17 14:50:53 INFO log:192 - Logging initialized @3046ms
2018-09-17 14:50:53 INFO Server:346 - jetty-9.3.z-SNAPSHOT
2018-09-17 14:50:53 INFO Server:414 - Started @3188ms
2018-09-17 14:50:53 INFO AbstractConnector:278 - Started ServerConnector@493dc226{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-09-17 14:50:53 INFO Utils:54 - Successfully started service 'SparkUI' on port 4040.
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@16ce702d{/jobs,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@40238dd0{/jobs/json,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7776ab{/jobs/job,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@dbd8e44{/jobs/job/json,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@51acdf2e{/stages,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6a55299e{/stages/json,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2f1de2d6{/stages/stage,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3a0baae5{/stages/stage/json,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7ac0e420{/stages/pool,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@289710d9{/stages/pool/json,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5a18cd76{/storage,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3da30852{/storage/json,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@403f0a22{/storage/rdd,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@503ecb24{/storage/rdd/json,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4c51cf28{/environment,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6995bf68{/environment/json,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5143c662{/executors,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@77825085{/executors/json,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3568f9d2{/executors/threadDump,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@71c27ee8{/executors/threadDump/json,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3e7dd664{/static,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4748a0f9{/,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4b14918a{/api,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@77d67cf3{/jobs/job/kill,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6dee4f1b{/stages/stage/kill,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://G1C2ML15621.mindtree.com:4040
2018-09-17 14:50:53 INFO Executor:54 - Starting executor ID driver on host localhost
2018-09-17 14:50:53 INFO Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51781.
2018-09-17 14:50:53 INFO NettyBlockTransferService:54 - Server created on G1C2ML15621.mindtree.com:51781
2018-09-17 14:50:53 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2018-09-17 14:50:53 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, G1C2ML15621.mindtree.com, 51781, None)
2018-09-17 14:50:53 INFO BlockManagerMasterEndpoint:54 - Registering block manager G1C2ML15621.mindtree.com:51781 with 896.4 MB RAM, BlockManagerId(driver, G1C2ML15621.mindtree.com, 51781, None)
2018-09-17 14:50:53 INFO BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, G1C2ML15621.mindtree.com, 51781, None)
2018-09-17 14:50:53 INFO BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, G1C2ML15621.mindtree.com, 51781, None)
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6cbcf243{/metrics/json,null,AVAILABLE,@Spark}
Exception in thread "main" java.net.ConnectException: Call From G1C2ML15621/172.17.124.224 to 104.211.213.47:50070 failed on connection exception: java.net.ConnectException: Connection refused: no further information; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
at org.apache.hadoop.ipc.Client.call(Client.java:1479)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy15.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:255)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy16.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1226)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:306)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:272)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:264)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1526)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:304)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:299)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:312)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)
at random$.main(random.scala:20)
at random.main(random.scala)
Caused by: java.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
at org.apache.hadoop.ipc.Client.call(Client.java:1451)
... 25 more
2018-09-17 14:51:00 INFO SparkContext:54 - Invoking stop() from shutdown hook
2018-09-17 14:51:00 INFO AbstractConnector:318 - Stopped Spark@493dc226{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-09-17 14:51:00 INFO SparkUI:54 - Stopped Spark web UI at http://G1C2ML15621.mindtree.com:4040
2018-09-17 14:51:00 INFO MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2018-09-17 14:51:00 INFO MemoryStore:54 - MemoryStore cleared
2018-09-17 14:51:00 INFO BlockManager:54 - BlockManager stopped
2018-09-17 14:51:00 INFO BlockManagerMaster:54 - BlockManagerMaster stopped
2018-09-17 14:51:00 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2018-09-17 14:51:00 INFO SparkContext:54 - Successfully stopped SparkContext
2018-09-17 14:51:00 INFO ShutdownHookManager:54 - Shutdown hook called
2018-09-17 14:51:00 INFO ShutdownHookManager:54 - Deleting directory C:\Users\M1047068\AppData\Local\Temp\spark-84d5b3c8-a609-42da-8e5e-5492400f309d
答案 0 :(得分:0)
无法从webhdfs读取火花。
您需要使用core-site.xml中fs.defaultFS属性上存在的端口号
如果将Hadoop XML文件复制到Spark安装中的conf文件夹中并定义HADOOP_CONF_DIR
环境文件夹,则无需设置hadoop home属性。
从Spark2开始,您要使用SparkSession,并从会话中使用textFile方法读取文件。
您将不需要在Spark中自己创建原始文件系统对象。