Question

我正在尝试为分布式系统课程中的最终项目制作简化的HDFS（Hadoop分布式文件系统）。

所以，我尝试的第一件事就是编写一个程序，将任意文件拆分成任意维度的块（块）。

我找到了this有用的示例，代码是：

package javabeat.net.io;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;

/**
 * Split File Example
 *
 * @author Krishna
 *
 */
public class SplitFileExample {
    private static String FILE_NAME = "TextFile.txt";
    private static byte PART_SIZE = 5;
    public static void main(String[] args) {
        File inputFile = new File(FILE_NAME);
        FileInputStream inputStream;
        String newFileName;
        FileOutputStream filePart;
        int fileSize = (int) inputFile.length();
        int nChunks = 0, read = 0, readLength = PART_SIZE;
        byte[] byteChunkPart;
        try {
            inputStream = new FileInputStream(inputFile);
            while (fileSize > 0) {
                if (fileSize <= 5) {
                    readLength = fileSize;
                }
                byteChunkPart = new byte[readLength];
                read = inputStream.read(byteChunkPart, 0, readLength);
                fileSize -= read;
                assert (read == byteChunkPart.length);
                nChunks++;
                newFileName = FILE_NAME + ".part"
                        + Integer.toString(nChunks - 1);
                filePart = new FileOutputStream(new File(newFileName));
                filePart.write(byteChunkPart);
                filePart.flush();
                filePart.close();
                byteChunkPart = null;
                filePart = null;
            }
            inputStream.close();
        } catch (IOException exception) {
            exception.printStackTrace();
        }
    }
}

但我认为存在一个大问题： PART_SIZE的值不能大于127 ，否则会出现error: possible loss of precision。

如果不完全更改代码，我该如何解决？

Answer 1

问题是PART_SIZE是byte;因此，它的最大值确实是127。

目前你所拥有的代码充满了问题;一，资源处理不正确等。

这是使用java.nio.file的版本：

private static final String FILENAME = "TextFile.txt";
private static final int PART_SIZE = xxx; // HERE

public static void main(final String... args)
    throws IOException
{
    final Path file = Paths.get(FILENAME).toRealPath();
    final String filenameBase = file.getFileName().toString();
    final byte[] buf = new byte[PART_SIZE];    

    int partNumber = 0;
    Path part;
    int bytesRead;
    byte[] toWrite;

    try (
        final InputStream in = Files.newInputStream(file);
    ) {
        while ((bytesRead = in.read(buf)) != -1) {
            part = file.resolveSibling(filenameBase + ".part" + partNumber);
            toWrite = bytesRead == PART_SIZE ? buf : Arrays.copyOf(buf, bytesRead);
            Files.write(part, toWrite, StandardOpenOption.CREATE_NEW);
            partNumber++;
        }
    }
}

Answer 2

List<PDDocument> Pages=new ArrayList<PDDocument>();
     Document.load(filePath);
    try {
        Splitter splitter = new Splitter();
        splitter.setSplitAtPage(NoOfPagesDocumentWillContain);
    Pages = splitter.split(document);


    }catch(Exception e)
    {
        l
        e.getCause().printStackTrace();
    }

将文件拆分为大小超过127的块

2 个答案: