Question

正在尝试将pdf文档编码为base64，如果数量较少（如2000文档），则可以正常工作。但是我有100k加上要编码的文件。

对所有这些文件进行编码需要更多时间。有没有更好的方法来编码大型数据集？？

请找到我当前的方法

 String filepath=doc.getPath().concat(doc.getFilename());

 file = new File(filepath);
    if(file.exists() && !file.isDirectory()) {
        try {
            FileInputStream fileInputStreamReader = new FileInputStream(file);
            byte[] bytes = new byte[(int) file.length()];
            fileInputStreamReader.read(bytes);
            encodedfile = new String(Base64.getEncoder().encodeToString(bytes));
            fileInputStreamReader.close();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }
    }

Answer 1

尝试一下：

弄清楚您需要编码多少个文件。

int files = Files.list(Paths.get(directory)).count();

将它们拆分为线程可以在Java中处理的合理数量。 I.E）如果您有100k个文件要编码。将其分成1000个列表（共1000个）。

int currentIndex = 0;
for (File file : filesInDir) {
    if (fileMap.get(currentIndex).size() >= cap)
        currentIndex++;
    fileMap.get(currentIndex).add(file);
}
/** Its going to take a little more effort than this, but its the idea im trying to show you*/

如果计算机资源可用，请一个接一个地执行每个工作线程。

for (Integer key : fileMap.keySet()) {
     new WorkerThread(fileMap.get(key)).start();
}

您可以通过以下方法查看当前可用资源：

 public boolean areResourcesAvailable() {
     return imNotThatNice();
 }

/**
 * Gets the resource utility instance
 * 
 * @return the current instance of the resource utility
 */
private static OperatingSystemMXBean getInstance() {
    if (ResourceUtil.instance == null) {
        ResourceUtil.instance = ManagementFactory.getOperatingSystemMXBean();
    }
    return ResourceUtil.instance;
}

如果要编码10万个文档，则将PDF文件编码为base64将花费更多时间

1 个答案: