我正在尝试从集群外部(HDFS apis)在HDFS中创建一个文件,如下所示:
Configuration conf = new Configuration();
conf.set("mapred.job.tracker", "192.168.56.101:54310");
conf.set("fs.default.name", "hdfs://192.168.56.101:54311");
FileSystem fs = FileSystem.get(conf);
fs.createNewFile(new Path("/app/hadoop/tmp/data/tools.txt"));
获取错误:
Exception in thread "main" org.apache.hadoop.ipc.RemoteException: java.io.IOException: Unknown protocol to job tracker: org.apache.hadoop.hdfs.protocol.ClientProtocol
at org.apache.hadoop.mapred.JobTracker.getProtocolVersion(JobTracker.java:370)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
at org.apache.hadoop.ipc.Client.call(Client.java:1113)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at $Proxy1.getProtocolVersion(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
at $Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.checkVersion(RPC.java:422)
at org.apache.hadoop.hdfs.DFSClient.createNamenode(DFSClient.java:183)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:281)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:245)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:100)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1446)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1464)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:263)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:124)
at LineCounter.main(LineCounter.java:107)
答案 0 :(得分:0)
Namenode公开两个端口,一个是ipc端口(默认为8020)和WebUI(默认为50070)。要从远程计算机访问HDFS文件系统,无需仅设置属性mapred.job.tracker
fs.default.name
就足够了,并确保使用正确的hdfs依赖库版本作为集群,并且可以从远程访问namenode机。
在连接到远程hdfs之前,通过检查集群中任何节点或edgenode的core-site.xml文件,找到fs.default.name(默认值为8020)的值。您可以通过在任何节点中执行以下命令来确保ipc端口是否正确
hadoop fs -ls hdfs://192.168.56.101:54310/
如果你能够访问hdfs,你可以在conf.set方法中给出(在hdsf上面的URI)
答案 1 :(得分:0)
请检查名称节点和Jobtracker守护程序是否正在运行,并且您提到的端口是否正确。
答案 2 :(得分:0)
我使用以下内容,它适用于Linux VM上的单个节点群集:
Configuration conf = new Configuration();
conf.set("fs.default.name", "hdfs://localhost:9000");
我建议你尝试使用9000作为端口号,否则找出需要在Hadoop配置文件中使用的正确端口号&#34; site-core.xml&#34; 。我的文件是这样的:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>~/hacking/hd-data/tmp</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>~/hacking/hd-data/snn</value>
</property>
</configuration>
另外,正如sachinjose指出的那样,我也建议您从代码中删除以下内容:
conf.set("mapred.job.tracker", "192.168.56.101:54310");