如何跳过已处理的文件

时间:2016-10-20 10:36:13

标签: java

我有一个应用程序遍历一个充满文件的文件夹并从中提取文本。我希望应用程序记录它已处理的文件,然后在重新运行程序时,跳过它已从中提取文本的同一文件夹中的那些文件。目前我能够记录已处理的文件,但是当我重新运行程序时,文件会被重新处理,这会减慢所有内容的速度。下面有什么问题,是否有更有效的方法?

public class Iterator {
    static HashSet<String> myFiles = new HashSet<String>();
    public static Preferences prefs;
    static String filename= "/Files/FilesLogged.txt";
    static String folderName;
    static Path p;
    public Iterator() {
    }

    public static void main(String[] args) throws IOException, SAXException, TikaException, SQLException, ParseException, URISyntaxException, BackingStoreException {       
        Preferences userPrefs = Preferences.userNodeForPackage(TBB_SQLBuilder.class);

        BufferedReader reader = new BufferedReader(new InputStreamReader(ClassLoader.class.getResourceAsStream(filename)),2048);
        String line = null;
        //Reading the files from the logger so they can be avoided
        while((line = reader.readLine()) != null) {
            myFiles.add(line);
        }


            //This iterates through each of the files in the specified folder and copies them to a log. 
            //It also checks to see if that file has been read already so that it isn't re-inputted into the database if run again               
            //Loop through the ArrayList with the full path names of each folder in the outer loop

            String[] keys = userPrefs.keys();
            for (String folderName : keys) {
                //Extract the folder name from the Prefs and iterate through
                if(userPrefs.get(folderName, null)!=null){
                        loopthrough(userPrefs.get(folderName, null));   
                }
            }   
            reader.close();
}               





public static void loopthrough(String folderName) throws IOException, SAXException, TikaException, SQLException, ParseException, URISyntaxException{

        File dir = new File(folderName);
        File[] directoryListing = dir.listFiles();        
            if (directoryListing != null) {
                for (File child : directoryListing) {

                        if(!myFiles.contains(child.getName())){
                        Preferences userPrefs = Preferences.userNodeForPackage(TBB_SQLBuilder.class);
                        FileWriter fw= new FileWriter(userPrefs.get("PathForLogger", null),true);

                                     BufferedWriter bw = new BufferedWriter(fw,2048);
                                     bw.write(child.getName().toString().trim());
                                     bw.newLine();
                                     bw.flush();
                                     bw.close();
                                     fw.close();

                                                               }
                                                   }
                                             }
      }

 }

3 个答案:

答案 0 :(得分:1)

通常在处理文件时,您执行以下操作: 当您开始处理时,您要做的第一件事是将文件移动到..inprocess或类似的东西或将其移动到inprocess目录。 完成处理后,将名称更改为..done或类似名称,或将其移动到完成目录。 这样,当您查找要处理的文件时,您可以避免进行中和已完成的文件。它还可以轻松查看需要重新处理的内容

答案 1 :(得分:0)

我认为程序读取和写入有两个不同的文件。

  1. 阅读文件:

    new BufferedReader(new InputStreamReader(ClassLoader.class.getResourceAsStream(filename)),2048);

  2. 写作文件:

    首选项userPrefs = Preferences.userNodeForPackage(TBB_SQLBuilder.class);

    FileWriter fw = new FileWriter(userPrefs.get(“PathForLogger”,null),true);

  3. 当然,程序不能使用不同的文件。

答案 2 :(得分:0)

创建一个包含20个或更少文件的测试环境进行检查。

更改您的代码:

    String line = null;
    //Reading the files from the logger so they can be avoided
    while ((line = reader.readLine()) != null)
    {
        myFiles.add(line);
        System.out.println("already processed: "+line);
    }
        for (File child : directoryListing)
        {
            String fileToCheck = child.getName();
            System.out.println("file to process: "+fileToCheck);
            if (!myFiles.contains(fileToCheck))
            {
                Preferences userPrefs = Preferences.userNodeForPackage(TBB_SQLBuilder.class);
                FileWriter fw = new FileWriter(userPrefs.get("PathForLogger", null), true);

                BufferedWriter bw = new BufferedWriter(fw, 2048);
                bw.write(fileToCheck.trim());
                bw.newLine();
                bw.flush();
                bw.close();
                fw.close();
            }
        }

比较文件名“已处理”和“要检查的文件”。

或者使用调试器。