我有一个要求,我必须扫描某些文件以匹配某些关键字。我的关键字列表大小约为40000,我的所有文件大约有4000行。此外,关键字不应在文件中注释,因此我也必须处理注释。我写的代码知道关键字的出现,每个文件花费大约5分钟。我不知道我可以做些什么来减少执行时间。 代码如下所示。
for (File fl : files) {
flag = false;
content = FileUtils.readFileToString(fl);
System.out.println(fl.getName());
fileName = fl.getName();
// Object Keywords scanning
keywords = null;
keywords = findKeywordType(fileName);
if (keywords != null) {
Boolean keywordCount = false;
for (String[] key : keywords) {
key[0] = key[1];
}
for (String[] key : keywords) {
Boolean check = false;
if (content.contains(key[0])) {
if (content.contains(key[3] + ".")) {
check = true;
}
if (check) {
continue;
}
if (content.contains(key[3])) {
keywordCount = FindOccurence(fl, key[0], key[3]);
if (keywordCount) {
System.out.println("Writing keywords");
objKwm = new ObjectKeywordMaster();
objKwm.setObjectName(key[0]);
objKwm.setObjectType(key[1]);
objKwm.setObjectOwner(key[2]);
objKwm.setDependentObjectName(key[3]);
objKwm.setDependentObjectType(key[4]);
objKwm.setDependentObjectOwner(key[5]);
objKw.getObjectKeywords().add(objKwm);
}
}
}
}
}
FindOccurrence方法代码是
private static Boolean FindOccurence(File fl, String objectName, String keyword) throws IOException {
int startComment = 0;
int endComment = 0;
Boolean objCheck = false;
Boolean keyCheck = false;
Boolean check = false;
List line = FileUtils.readLines(fl);
int fileLength = line.size();
int objCount = 0;
int keyCount = 0;
loop:
for (int j = 0; j < fileLength; j++) {
if (line.get(j).toString().contains("/*")) {
startComment = j;
}
if (line.get(j).toString().contains("*/")) {
endComment = j;
}
if (line.get(j).toString().contains(objectName)) {
objCheck = false;
Pattern p = Pattern.compile("\\b" + objectName + "\\b");
Matcher m = p.matcher(line.get(j).toString());
while (m.find()) {
objCheck = true;
objCount++;
}
if (objCheck) {
if (line.get(j).toString().contains("#")) {
int objIndex = line.get(j).toString().indexOf(objectName);
int commentIndex = line.get(j).toString().indexOf("#");
if (objIndex > commentIndex) {
objCount--;
}
} else {
if (line.get(j).toString().contains("--")) {
int objIndex = line.get(j).toString().indexOf(objectName);
int commentIndex = line.get(j).toString()
.indexOf("--");
if (objIndex > commentIndex) {
objCount--;
}
}
}
if ((j >= startComment && j <= endComment)||(j >= startComment && endComment==0)) {
objCount--;
}
}
}
if (line.get(j).toString().contains(keyword)) {
keyCheck = false;
Pattern p = Pattern.compile("\\b" + keyword + "\\b");
Matcher m = p.matcher(line.get(j).toString());
while (m.find()) {
keyCheck = true;
keyCount++;
}
if (keyCheck) {
if (line.get(j).toString().contains("#")) {
int objIndex = line.get(j).toString().indexOf(keyword);
int commentIndex = line.get(j).toString().indexOf("#");
if (objIndex > commentIndex) {
keyCount--;
}
} else {
if (line.get(j).toString().contains("--")) {
int objIndex = line.get(j).toString().indexOf(keyword);
int commentIndex = line.get(j).toString()
.indexOf("--");
if (objIndex > commentIndex) {
keyCount--;
}
}
}
if ((j >= startComment && j <= endComment)||(j >= startComment && endComment==0)) {
keyCount--;
}
}
}
if(objCount > 0 && keyCount >0){
check = true;
break loop;
} else
check = false;
}
return check;
}
}
我有两个发现在同一个列表中出现的两个关键字。请提出一些方法,以便缩短执行时间。
答案 0 :(得分:0)
1)在开始寻找任何关键字之前,请准备文件内容:删除评论,....
2)以单词分割文件内容。
3)不要为每个关键字循环:使用存储所有关键字的Set。