我在hdfs中存储500 Mb或更大的视频文件。由于它大于块大小,它将被分发。我必须首先收集或工作第一个数据块(这里是视频文件),因为它只包含序列标题。我该怎么做或如何在hadoop中找到文件的第一个数据块?
答案 0 :(得分:1)
您想要读取第一个块,您可以从InputStream
获取FileSystem
并读取字节,直到达到预定量(示例块大小64mb将为64 * 1024 * 1024字节)。这是一个例子(虽然64mb是很多数据。如果你认为你需要的数据早于64mb,那就改变bytesLeft)
import java.io.EOFException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.zookeeper.common.IOUtils;
public class TestReaderFirstBlock {
private static final String uri = "hdfs://localhost:9000/path/to/file";
private static int bytesLeft = 64 * 1024 * 1024;
private static final byte[] buffer = new byte[4096];
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
InputStream is = fs.open(new Path(uri));
OutputStream out = System.out;
while (bytesLeft > 0) {
int read = is.read(buffer, 0, Math.min(bytesLeft, buffer.length));
if (read == -1) {
throw new EOFException("Unexpected end of data");
}
out.write(buffer, 0, read);
bytesLeft -= read;
}
IOUtils.closeStream(is);
}
}