Question

我一直在寻找解析器将生成的序列文件（.seq）转换为普通文本文件以了解中间输出。我很高兴知道是否有人遇到过如何做到这一点。

Answer 1

我认为您可以在几行代码中创建一个SequenceFile Reader，如下所示

public static void main(String[] args) throws IOException {
    String uri = "path/to/your/sequence/file";
    Configuration conf = new Configuration();
    FileSystem fs = FileSystem.get(URI.create(uri), conf);
    Path path = new Path(uri);

    SequenceFile.Reader reader = null;
    try {
        reader = new SequenceFile.Reader(fs, path, conf);
        Writable key = (Writable) ReflectionUtils.newInstance(
                    reader.getKeyClass(), conf);
        Writable value = (Writable) ReflectionUtils.newInstance(
                    reader.getValueClass(), conf);
        long position = reader.getPosition();
        while (reader.next(key, value)) {
                System.out.println("Key: " + key + " value:" + value);
                position = reader.getPosition();
            }
        } finally {
            reader.close();
    }
}

Answer 2

假设您在/ ex-seqdata / part-000中的hdfs中有序列数据... 所以part- *数据是二进制格式。现在你可以运行命令hadoop fs -text / ex-seqdata / part * 在命令提示符下以人类可读的格式获取数据。

如何将mahout生成的序列文件转换为文本文件

2 个答案: