我正在尝试连接到远程HDFS实例
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://hostName:8020");
conf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem");
FileSystem fs = FileSystem.get(conf);
RemoteIterator<LocatedFileStatus> ri = fs.listFiles(fs.getHomeDirectory(), false);
while (ri.hasNext()) {
LocatedFileStatus lfs = ri.next();
//log.debug(lfs.getPath().toString());
}
fs.close();
这是我的Maven依赖项
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-examples</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.1</version>
</dependency>
这是我的远程节点上的hadoop version命令的结果
hadoop version
Hadoop 2.7.1.2.3.0.0-2557
但是我得到了
Exception in thread "main" java.lang.UnsupportedOperationException: Not implemented by the DistributedFileSystem FileSystem implementation
at org.apache.hadoop.fs.FileSystem.getScheme(FileSystem.java:217)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2624)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2634)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
at filecheck.HdfsTest.main(HdfsTest.java:21)
这是导致错误的行
FileSystem fs = FileSystem.get(conf);
知道为什么会这样吗?
尝试了Manjunath的回答
这是我得到的
ERROR util.Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:356)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:371)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:364)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2807)
at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2802)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2668)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
at filecheck.HdfsTest.main(HdfsTest.java:27)
15/11/16 09:48:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.IllegalArgumentException: Pathname from hdfs://hostName:8020 is not a valid DFS filename.
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:197)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:106)
at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:940)
at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:927)
at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:872)
at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:868)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:886)
at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1694)
at org.apache.hadoop.fs.FileSystem$6.<init>(FileSystem.java:1787)
at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:1783)
at filecheck.HdfsTest.main(HdfsTest.java:29)
答案 0 :(得分:1)
我的HDFS客户端代码使用hadoop-hdfs也需要:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.1</version>
</dependency>
我使用的是Hortonworks存储库:
<repository>
<id>repo.hortonworks.com</id>
<name>Hortonworks HDP Maven Repository</name>
<url>http://repo.hortonworks.com/content/repositories/releases/</url>
</repository>
我认为您正在选择错误版本的FileSystem。
答案 1 :(得分:1)
FileSystem.java
方法中的getScheme()
异常发生异常,只会引发UnsupportedOperationException
异常。
public String getScheme() {
throw new UnsupportedOperationException("Not implemented by the " + getClass().getSimpleName() + " FileSystem implementation");
}
它正在调用getScheme()
类的FileSystem
方法,而不是从getScheme()
类调用DistributedFileSystem
方法。
getScheme()
类的DistributedFileSystem
方法返回:
@Override
public String getScheme() {
return HdfsConstants.HDFS_URI_SCHEME;
}
因此,要解决此问题,您需要更改“FileSystem.get(conf)”语句,如下所示:
DistributedFileSystem fs = (DistributedFileSystem) FileSystem.get(conf);
修改强>
我试用了这个程序,它对我来说非常好。事实上,无论是否有铸造,它都可以使用。 以下是我的代码(唯一不同的是,我将递归列表设置为“true”):
package com.hadooptests;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.LocatedFileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.RemoteIterator;
import org.apache.hadoop.hdfs.DistributedFileSystem;
import java.io.IOException;
public class HDFSConnect {
public static void main(String[] args)
{
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://machine:8020");
conf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem");
DistributedFileSystem fs = null;
try {
fs = (DistributedFileSystem) FileSystem.get(conf);
RemoteIterator<LocatedFileStatus> ri;
ri = fs.listFiles(new Path("hdfs://machine:8020/"), true);
while (ri.hasNext()) {
LocatedFileStatus lfs = ri.next();
System.out.println(lfs.getPath().toString());
}
fs.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
我的妈妈:
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>2.6</version>
<configuration>
<archive>
<manifest>
<mainClass>com.hadooptests.HDFSConnect
</mainClass>
</manifest>
</archive>
</configuration>
</plugin>
</plugins>
</build>
我将程序运行为:
java -cp "%CLASSPATH%;hadooptests-1.0-SNAPSHOT.jar" com.hadooptests.HDFSConnect
其中CLASSPATH设置为:
.;%HADOOP_HOME%\etc\hadoop\;%HADOOP_HOME%\share\hadoop\common\*;%HADOOP_HOME%\share\hadoop\common\lib\*;%HADOOP_HOME%\share\hadoop\hdfs\*;%HADOOP_HOME%\share\hadoop\hdfs\lib\*;%HADOOP_HOME%\share\hadoop\mapreduce\*;%HADOOP_HOME%\share\hadoop\mapreduce\lib\*;%HADOOP_HOME%\share\hadoop\tools\*;%HADOOP_HOME%\share\hadoop\tools\lib\*;%HADOOP_HOME%\share\hadoop\yarn\*;%HADOOP_HOME%\share\hadoop\yarn\lib\*
我得到了一些输出:
hdfs://machine:8020/app-logs/machine/logs/application_1439815019232_0001/machine.corp.com_45454
hdfs://machine:8020/app-logs/machine/logs/application_1439815019232_0002/machine.corp.com_45454
hdfs://machine:8020/app-logs/machine/logs/application_1439817471006_0002/machine.corp.com_45454
hdfs://machine:8020/app-logs/machine/logs/application_1439817471006_0003/machine.corp.com_45454
编辑2:
我的环境:
Windows上的Hadoop 2.7.1。
我安装了HDP 2.3.0,它部署了Hadoop 2.7.1