我正在尝试创建hadoop序列文件。
我已成功将序列文件创建到HDFS中,但是如果我尝试读取序列文件,则会出现“序列文件不是SequenceFile” 的错误。我还检查了HDFS中创建的序列文件。
这是我的源代码,可以将序列文件读写到HDFS中。
package us.qi.hdfs;
import java.io.IOException;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.ArrayFile;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;
public class SequenceFileText {
public static void main(String args[]) throws IOException {
/** Get Hadoop HDFS command and Hadoop Configuration*/
HDFS_Configuration conf = new HDFS_Configuration();
HDFS_Test hdfs = new HDFS_Test();
String uri = "hdfs://slave02:9000/user/hadoop/test.seq";
/** Get Configuration from HDFS_Configuration Object by using get_conf()*/
Configuration config = conf.get_conf();
SequenceFile.Writer writer = null;
SequenceFile.Reader reader = null;
try {
Path path = new Path(uri);
IntWritable key = new IntWritable();
Text value = new Text();
writer = SequenceFile.createWriter(config, SequenceFile.Writer.file(path), SequenceFile.Writer.keyClass(key.getClass()),
ArrayFile.Writer.valueClass(value.getClass()));
reader = new SequenceFile.Reader(config, SequenceFile.Reader.file(path));
writer.append(new IntWritable(11), new Text("test"));
writer.append(new IntWritable(12), new Text("test2"));
writer.close();
while (reader.next(key, value)) {
System.out.println(key + "\t" + value);
}
reader.close();
} catch (IOException e) {
e.printStackTrace();
} finally {
IOUtils.closeStream(writer);
IOUtils.closeStream(reader);
}
}
}
并且发生此错误。
2018-09-17 17:15:34,267警告[main] util.NativeCodeLoader (NativeCodeLoader.java:(62))-无法加载本地hadoop 您平台的库...使用内建的Java类,其中 适用于2018-09-17 17:15:38,870 INFO [main] compress.CodecPool (CodecPool.java:getCompressor(153))-获得了全新的压缩机 [.deflate] java.io.EOFException: hdfs:// slave02:9000 / user / hadoop / test.seq不是SequenceFile org.apache.hadoop.io.SequenceFile $ Reader.init(SequenceFile.java:1933) 在 org.apache.hadoop.io.SequenceFile $ Reader.initialize(SequenceFile.java:1892) 在 org.apache.hadoop.io.SequenceFile $ Reader。(SequenceFile.java:1841) 在我们.qi.hdfs.SequenceFileText.main(SequenceFileText.java:36)
答案 0 :(得分:0)
那是我的错误。我更改了一些源代码。
首先,我检查hdfs中是否已存在该文件。如果没有文件,则创建一个writer对象。
写程序完成后,我检查序列文件。检查文件后,我成功读取了序列文件。
这是我的代码。谢谢!
try {
Path path = new Path(uri);
IntWritable key = new IntWritable();
Text value = new Text();
/** First, Check a file already exists.
* If there is not exists in hdfs, writer object is created.
* */
if (!fs.exists(path)) {
writer = SequenceFile.createWriter(config, SequenceFile.Writer.file(path), SequenceFile.Writer.keyClass(key.getClass()),
ArrayFile.Writer.valueClass(value.getClass()));
writer.append(new IntWritable(11), new Text("test"));
writer.append(new IntWritable(12), new Text("test2"));
writer.close();
} else {
logger.info(path + " already exists.");
}
/** Create a SequenceFile Reader object.*/
reader = new SequenceFile.Reader(config, SequenceFile.Reader.file(path));
while (reader.next(key, value)) {
System.out.println(key + "\t" + value);
}
reader.close();
} catch (IOException e) {
e.printStackTrace();
} finally {
IOUtils.closeStream(writer);
IOUtils.closeStream(reader);
}