Question

我正在尝试从集群外部（HDFS apis）在HDFS中创建一个文件，如下所示：

 Configuration conf = new Configuration();
 conf.set("mapred.job.tracker", "192.168.56.101:54310"); 

 conf.set("fs.default.name", "hdfs://192.168.56.101:54311");

 FileSystem fs = FileSystem.get(conf);

 fs.createNewFile(new Path("/app/hadoop/tmp/data/tools.txt"));

获取错误：

Exception in thread "main" org.apache.hadoop.ipc.RemoteException: java.io.IOException: Unknown protocol to job tracker: org.apache.hadoop.hdfs.protocol.ClientProtocol
at org.apache.hadoop.mapred.JobTracker.getProtocolVersion(JobTracker.java:370)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
at org.apache.hadoop.ipc.Client.call(Client.java:1113)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at $Proxy1.getProtocolVersion(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
at $Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.checkVersion(RPC.java:422)
at org.apache.hadoop.hdfs.DFSClient.createNamenode(DFSClient.java:183)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:281)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:245)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:100)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1446)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1464)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:263)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:124)
at LineCounter.main(LineCounter.java:107)

Answer 1

Namenode公开两个端口，一个是ipc端口（默认为8020）和WebUI（默认为50070）。要从远程计算机访问HDFS文件系统，无需仅设置属性mapred.job.tracker fs.default.name就足够了，并确保使用正确的hdfs依赖库版本作为集群，并且可以从远程访问namenode机。

在连接到远程hdfs之前，通过检查集群中任何节点或edgenode的core-site.xml文件，找到fs.default.name（默认值为8020）的值。您可以通过在任何节点中执行以下命令来确保ipc端口是否正确

hadoop fs -ls hdfs://192.168.56.101:54310/

如果你能够访问hdfs，你可以在conf.set方法中给出（在hdsf上面的URI）

Answer 2

请检查名称节点和Jobtracker守护程序是否正在运行，并且您提到的端口是否正确。

Answer 3

我使用以下内容，它适用于Linux VM上的单个节点群集：

Configuration conf = new Configuration();
        conf.set("fs.default.name", "hdfs://localhost:9000");

我建议你尝试使用9000作为端口号，否则找出需要在Hadoop配置文件中使用的正确端口号＆＃34; site-core.xml＆＃34; 。我的文件是这样的：

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>~/hacking/hd-data/tmp</value>
  </property>
  <property>
    <name>fs.checkpoint.dir</name>
    <value>~/hacking/hd-data/snn</value>
  </property>
</configuration>

另外，正如sachinjose指出的那样，我也建议您从代码中删除以下内容：

conf.set("mapred.job.tracker", "192.168.56.101:54310");

Hadoop远程文件创建失败

3 个答案: