我正在学习如何从hdfs读取/写入文件。
这是我用来阅读的代码:
import java.io.InputStream;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
public class FileSystemCat {
public static void main (String [] args) throws Exception {
String uri = "/user/hadoop/file.txt";
Configuration conf = new Configuration();
conf.addResource(new Path("/usr/local/hadoop/etc/hadoop/core-site.xml"));
conf.addResource(new Path("/usr/local/hadoop/etc/hadoop/hdfs-site.xml"));
FileSystem fs = FileSystem.get(URI.create(uri),conf);
InputStream in = null;
try{
in = fs.open(new Path(uri));
IOUtils.copyBytes(in, System.out, 4096,false);
}finally{
IOUtils.closeStream(in);
}
}
}
文件在那里
但是,当我在eclipse中运行代码时,我得到以下内容
Exception in thread "main" java.io.FileNotFoundException: File /user/hadoop/file.txt does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:397)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:137)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764)
at hadoop.FileSystemCat.main(FileSystemCat.java:22)
我用作路径 file:///user/hadoop/file.txt 和 hdfs:///user/hadoop/file.txt
对于后者,错误略有不同:
Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs
芯-site.xml中
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost/</value>
</property>
</configuration>
HDFS-site.xml中
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/hadoop_store/hdfs/namenode/</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/hadoop_store/hdfs/datanode/,file:///mnt/hadoop/hadoop_store/hdfs/datanode/</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
有任何问题吗?
由于
答案 0 :(得分:3)
您应该更改
行FileSystem fs = FileSystem.get(URI.create(uri),conf);
这样的事情
FileSystem fs = FileSystem.get(URI.create("hdfs://localhost"), conf);
如果你的uri路径在hdfs中,那应该可行。
要查看您的uri路径是否在hdfs中,您可以在命令行中执行hadoop fs -ls /
答案 1 :(得分:1)
使用HDFS配置参数添加XML文件:
Configuration conf = new Configuration();
conf.addResource(new Path("your_hadoop_path/conf/core-site.xml"));
conf.addResource(new Path("your_hadoop_path/conf/hdfs-site.xml"));
FileSystem fs = FileSystem.get(URI.create(uri),conf);
答案 2 :(得分:1)
如果你想读取HDFS文件的数据,那么这段代码就可以了。
package com.yp.util;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class ReadHadoopFileData {
public static void main(String[] args) throws IOException {
Configuration conf = new Configuration();
FileSystem hdfs = FileSystem.get(conf);
Path hdfsFile = new Path(args[0]);
try {
BufferedReader br=new BufferedReader(new InputStreamReader(hdfs.open(hdfsFile)));
String line;
line=br.readLine();
while (line != null){
System.out.println(line);
line=br.readLine();
}
}catch (IOException ioe) {
ioe.printStackTrace();
}
}
}
当您使用命令行运行时,所有环境设置都将由hadoop处理。
在程序上运行的命令(假设你创建了Read.jar和hdfs文件是part-r-00000)
hadoop jar Read.jar com.yp.util.ReadHadoopFileData /MyData/part-r-00000