如何批量删除文件列表?

时间:2017-04-07 05:17:09

标签: java

我试图使用下面的代码删除目录中的旧文件。

for(File listFile : listFiles) {
                if(listFile.lastModified() < purgeTime) //Checks if the lastModified time of file is lesser than Purge time
                { 
                    try{
                    listFile.delete(); // Delete file if lastModified time is lesser than Purge time
                    //System.out.println("Files Deleted");
                    logger.error(new StringBuffer(contextInfo).append("Files Deleted")); 
                    }catch(Exception e){
                    //System.out.println("FileDeletionError"+e.toString());
                    }

                 }else{

                     logger.error(new StringBuffer(contextInfo).append("Files Not Deleted"));
                     //System.out.println("Files Not Deleted");
                 }
              }

我面临的问题是,如果目录有超过200万条记录,那么应用程序无法处理它。有没有办法可以批量删除它们?

3 个答案:

答案 0 :(得分:0)

我想你不使用新的nio API。看起来你使用file.listFiles()。在这种情况下,JVM将对象保留在内存中。尝试使用nio文件API

try (DirectoryStream<Path> dir = Files.newDirectoryStream(yourFolder.toPath)) {
        for (Path file : dir)
           Files.deleteIfExists(file); 
    }
    catch (IOException e) {
        //handle error here;
    }

在这种情况下,代码使用迭代器而不占用内存。

答案 1 :(得分:0)

您可以使用多线程并行删除文件,每个线程删除一个文件。 假设您使用的是java-8,以下代码应作为指南

List<File> listFiles = (List<File>) Arrays.asList(dir.listFiles());
listFiles.parallelStream().forEach((file)->{
    String filename = file.getName();
    if(file.lastModified() < purgeTime){
        if(!file.delete()){
            System.out.println("can't delete file: "+filename);
        }else{
            System.out.println("deleted: "+filename);
        }
    }
});

如果您想使用java-6完成相同的操作,可以使用以下方法:

File[] listFiles=dir.listFiles();
ExecutorService tpe = Executors.newFixedThreadPool(10);
for(File file:listFiles){
    Runnable r = new Runnable() {
        @Override
        public void run() {
            String filename = file.getName();
            System.out.println(filename+":"+file.lastModified());
            if(file.lastModified() < purgeTime){
                if(!file.delete()){
                    System.out.println("can't delete file: "+filename);
                }else{
                    System.out.println("deleted: "+filename);
                }
            }                   
        }
    };
    tpe.submit(r);
}
tpe.shutdown();

答案 2 :(得分:0)

由于该方法返回java.lang.OutOfMemoryError: Java heap space个对象的数组,因此在listFiles发现File的原因是相关的。根据路径信息,这些对象可能会占用大量内存。

要解决问题,你可以 使用dir.listFile()并增加JVM可能使用的最大堆空间 或:您可以使用dir.list()

减少用于存储文件名的消耗内存

两种方法的区别在于 dir.listFile()将完整的文件信息返回为File[]
dir.list()将裸文件名作为String[]

返回

在下面找到我用于测试Java 6解决方案的场景。

1)创建一个长路径名(233个字符)和两百万个虚拟文件的目录。 (这需要一些时间)

#!/bin/sh

HUGE_DIR=/tmp/1234567890/1234567890/1234567890/\
1234567890/1234567890/1234567890/1234567890/\
1234567890/1234567890/1234567890/1234567890/\
1234567890/1234567890/1234567890/1234567890/\
1234567890/1234567890/1234567890/1234567890/\
1234567890/huge-dir

printf "length dir name: "
printf ${HUGE_DIR} | wc -c

mkdir -p ${HUGE_DIR}
cd ${HUGE_DIR}
dd if=/dev/zero of=masterfile bs=1 count=2000000
split -b 1 -a 10 masterfile
rm masterfile

2)创建一个Java类来演示堆内存消耗。

import java.io.File;
import java.lang.management.ManagementFactory;

public class HugeDir {

    static long getUsedHeapSize() {
        return ManagementFactory.getMemoryMXBean()
                .getHeapMemoryUsage()
                .getUsed();
    }
    static final String OUT_FORMAT = "%-34s: %,13d\n";

    public static void main(String[] args) {
        System.out.printf("%s %s (%s bit)\n",
                System.getProperty("java.vm.name"),
                System.getProperty("java.version"),
                System.getProperty("sun.arch.data.model")
        );
        String hugeDir = "/tmp/1234567890/1234567890/1234567890/"
                + "1234567890/1234567890/1234567890/1234567890/"
                + "1234567890/1234567890/1234567890/1234567890/"
                + "1234567890/1234567890/1234567890/1234567890/"
                + "1234567890/1234567890/1234567890/1234567890/"
                + "1234567890/huge-dir";

        long usedHeapBefore = getUsedHeapSize();

        File dir = new File(hugeDir);       
        Object[] listFiles;

        long start = System.currentTimeMillis();
        // tests were executed with either of the next two lines
        listFiles = dir.listFiles();
        // listFiles = dir.list();

        long end = System.currentTimeMillis();

        System.out.printf(OUT_FORMAT,
                "time spent for reading in ms",
                (end - start));
        System.out.printf(OUT_FORMAT,
                "files in huge-dir",
                listFiles.length);

        System.out.printf(OUT_FORMAT,
                "used heap before reading huge-dir", usedHeapBefore);
        System.out.printf(OUT_FORMAT,
                "used heap after reading huge-dir",
                getUsedHeapSize());
    }
}

使用32位和64位版本的Oracle JDK 1.6.0_46执行测试。

listFiles = dir.listFiles()

执行该类时指定了最大堆内存使用量。您的计算机上的值可能会有所不同。

# java -Xmx1300m -client HugeDir
Java HotSpot(TM) Client VM 1.6.0_45 (32 bit)
time spent for reading in ms      :        12,026
files in huge-dir                 :     2,000,000
used heap before reading huge-dir :       287,880
used heap after reading huge-dir  : 1,291,299,856

# java -Xmx1500m -server HugeDir
Java HotSpot(TM) Server VM 1.6.0_45 (32 bit)
time spent for reading in ms      :        15,324
files in huge-dir                 :     2,000,000
used heap before reading huge-dir :       403,872
used heap after reading huge-dir  : 1,310,415,976

# java -Xmx1600m HugeDir
Java HotSpot(TM) 64-Bit Server VM 1.6.0_45 (64 bit)
time spent for reading in ms      :        19,265
files in huge-dir                 :     2,000,000
used heap before reading huge-dir :       403,880
used heap after reading huge-dir  : 1,361,800,504

listFiles = dir.list()

所有以java -client HugeDir执行的测试。

Java HotSpot(TM) Client VM 1.6.0_45 (32 bit)
time spent for reading in ms      :         2,982
files in huge-dir                 :     2,000,000
used heap before reading huge-dir :       287,880
used heap after reading huge-dir  :   156,017,528

Java HotSpot(TM) Server VM 1.6.0_45 (32 bit)
time spent for reading in ms      :         2,665
files in huge-dir                 :     2,000,000
used heap before reading huge-dir :       403,872
used heap after reading huge-dir  :   182,349,984

Java HotSpot(TM) 64-Bit Server VM 1.6.0_45 (64 bit)
time spent for reading in ms      :         2,585
files in huge-dir                 :     2,000,000
used heap before reading huge-dir :       403,880
used heap after reading huge-dir  :   162,183,992

如你所见。与dir.listFile()相比,读取庞大目录的所有文件名后使用的堆内存仅为八分之一。

可能的解决方案(使用dir.list())删除这么大的目录中的文件可能如下所示。

import java.io.File;

public class HugeDirDelete {

    public static void main(String[] args) {
        System.out.printf("%s %s (%s bit)\n",
                System.getProperty("java.vm.name"),
                System.getProperty("java.version"),
                System.getProperty("sun.arch.data.model")
        );
        String hugeDir = "/tmp/1234567890/1234567890/1234567890/"
                + "1234567890/1234567890/1234567890/1234567890/"
                + "1234567890/1234567890/1234567890/1234567890/"
                + "1234567890/1234567890/1234567890/1234567890/"
                + "1234567890/1234567890/1234567890/1234567890/"
                + "1234567890/huge-dir";

        File dir = new File(hugeDir);
        String[] listFiles = dir.list();

        long start = System.currentTimeMillis();
        for (String fileName : listFiles) {
            String canonicalFileName = hugeDir + File.separator + fileName;
            File file = new File(canonicalFileName);

            // here you should add your deletion criteria check
            // for demonstration purpose simply all files are deleted

            if (!file.delete()) {
                System.out.printf("%-34s: %s\n",
                        "file could not be deleted",
                        canonicalFileName);
            }
        }
        long end = System.currentTimeMillis();

        System.out.printf("%-34s: %,9d\n",
                "files in huge-dir",
                listFiles.length);
        System.out.printf("%-34s: %,9d\n",
                "delete all files, duration in ms",
                (end - start));
    }
}

输出(持续时间因机器而异)

Java HotSpot(TM) 64-Bit Server VM 1.6.0_45 (64 bit)
files in huge-dir                 : 2,000,000
delete all files, duration in ms  :   120,427