我有一个应用程序遍历一个充满文件的文件夹并从中提取文本。我希望应用程序记录它已处理的文件,然后在重新运行程序时,跳过它已从中提取文本的同一文件夹中的那些文件。目前我能够记录已处理的文件,但是当我重新运行程序时,文件会被重新处理,这会减慢所有内容的速度。下面有什么问题,是否有更有效的方法?
public class Iterator {
static HashSet<String> myFiles = new HashSet<String>();
public static Preferences prefs;
static String filename= "/Files/FilesLogged.txt";
static String folderName;
static Path p;
public Iterator() {
}
public static void main(String[] args) throws IOException, SAXException, TikaException, SQLException, ParseException, URISyntaxException, BackingStoreException {
Preferences userPrefs = Preferences.userNodeForPackage(TBB_SQLBuilder.class);
BufferedReader reader = new BufferedReader(new InputStreamReader(ClassLoader.class.getResourceAsStream(filename)),2048);
String line = null;
//Reading the files from the logger so they can be avoided
while((line = reader.readLine()) != null) {
myFiles.add(line);
}
//This iterates through each of the files in the specified folder and copies them to a log.
//It also checks to see if that file has been read already so that it isn't re-inputted into the database if run again
//Loop through the ArrayList with the full path names of each folder in the outer loop
String[] keys = userPrefs.keys();
for (String folderName : keys) {
//Extract the folder name from the Prefs and iterate through
if(userPrefs.get(folderName, null)!=null){
loopthrough(userPrefs.get(folderName, null));
}
}
reader.close();
}
public static void loopthrough(String folderName) throws IOException, SAXException, TikaException, SQLException, ParseException, URISyntaxException{
File dir = new File(folderName);
File[] directoryListing = dir.listFiles();
if (directoryListing != null) {
for (File child : directoryListing) {
if(!myFiles.contains(child.getName())){
Preferences userPrefs = Preferences.userNodeForPackage(TBB_SQLBuilder.class);
FileWriter fw= new FileWriter(userPrefs.get("PathForLogger", null),true);
BufferedWriter bw = new BufferedWriter(fw,2048);
bw.write(child.getName().toString().trim());
bw.newLine();
bw.flush();
bw.close();
fw.close();
}
}
}
}
}
答案 0 :(得分:1)
通常在处理文件时,您执行以下操作: 当您开始处理时,您要做的第一件事是将文件移动到..inprocess或类似的东西或将其移动到inprocess目录。 完成处理后,将名称更改为..done或类似名称,或将其移动到完成目录。 这样,当您查找要处理的文件时,您可以避免进行中和已完成的文件。它还可以轻松查看需要重新处理的内容
答案 1 :(得分:0)
我认为程序读取和写入有两个不同的文件。
阅读文件:
new BufferedReader(new InputStreamReader(ClassLoader.class.getResourceAsStream(filename)),2048);
写作文件:
首选项userPrefs = Preferences.userNodeForPackage(TBB_SQLBuilder.class);
FileWriter fw = new FileWriter(userPrefs.get(“PathForLogger”,null),true);
当然,程序不能使用不同的文件。
答案 2 :(得分:0)
创建一个包含20个或更少文件的测试环境进行检查。
更改您的代码:
String line = null;
//Reading the files from the logger so they can be avoided
while ((line = reader.readLine()) != null)
{
myFiles.add(line);
System.out.println("already processed: "+line);
}
for (File child : directoryListing)
{
String fileToCheck = child.getName();
System.out.println("file to process: "+fileToCheck);
if (!myFiles.contains(fileToCheck))
{
Preferences userPrefs = Preferences.userNodeForPackage(TBB_SQLBuilder.class);
FileWriter fw = new FileWriter(userPrefs.get("PathForLogger", null), true);
BufferedWriter bw = new BufferedWriter(fw, 2048);
bw.write(fileToCheck.trim());
bw.newLine();
bw.flush();
bw.close();
fw.close();
}
}
比较文件名“已处理”和“要检查的文件”。
或者使用调试器。