问题:我有一个包含大约700个字符串的数组,我正在读取一个List。然后我有一个包含超过1500个文件的目录。我需要打开这些文件中的每一个,看看700个字符串中是否有任何一个出现在每个文件中。
当前解决方案:在阅读了700个字符串(这几乎是瞬间)后,这就是我正在做的事情:
public static void scanMyDirectory(final File myDirectory, final List<String> listOfStrings) {
for (final File fileEntry : myDirectory.listFiles()) {
System.out.println("Entering file: " + currentCount++);
if (fileEntry.isDirectory()) {
scanMyDirectory(fileEntry, listOfStrings);
} else {
BufferedReader br = null;
try {
String sCurrentLine;
br = new BufferedReader(new FileReader(fileEntry.getPath()));
while ((sCurrentLine = br.readLine()) != null) {
for (int i = 0; i < listOfStrings.size(); i++) {
if (org.apache.commons.lang3.StringUtils.containsIgnoreCase(sCurrentLine, listOfStrings.get(i))) {
matchLocations.put(listOfStrings.get(i), fileEntry.getPath());
}
}
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (br != null) {
br.close();
}
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
}
}
调用此过程后,我将所有结果存储在HashMap中,并将结果输出到屏幕或文件。
问题:更快的方法是什么?它似乎非常慢(大约需要20-25分钟才能运行~1500个文件)。我对线程不太熟悉,但我考虑过使用它。但是,this question中的最佳答案让我有点失望。加快绩效的最佳方法是什么?
答案 0 :(得分:2)
我更喜欢NIO
读取行:
private final Map<String, String> matchLocations = new HashMap<>();
private int currentCount = 0;
public void scanMyDirectory(final File myDirectory, final List<String> listOfStrings) {
File[] files = myDirectory.listFiles();
if (files == null) {
return;
}
Stream.of(files).forEach(fileEntry -> {
if (fileEntry.isDirectory()) {
scanMyDirectory(fileEntry, listOfStrings);
} else {
System.out.println("Entering file: " + currentCount++);
try {
List<String> lines = Files.readAllLines(Paths.get(fileEntry.getAbsolutePath()), StandardCharsets.UTF_8);
StringBuilder sb = new StringBuilder();
lines.forEach(s -> sb.append(s.toLowerCase()).append("\n"));
listOfStrings.forEach(s -> {
if (sb.indexOf(s.toLowerCase()) > 0) {
matchLocations.put(s, fileEntry.getPath());
}
});
} catch (IOException e) {
e.printStackTrace();
}
}
});
}
如上所述,不需要多线程......但如果您有兴趣:
private final ConcurrentHashMap<String, String> matchLocations = new ConcurrentHashMap<>();
private final ForkJoinPool pool = new ForkJoinPool();
private int currentCount = 0;
public void scanMyDirectory(final File myDirectory, final List<String> listOfStrings) {
File[] files = myDirectory.listFiles();
if (files == null) {
return;
}
Stream.of(files).forEach(fileEntry -> {
if (fileEntry.isDirectory()) {
scanMyDirectory(fileEntry, listOfStrings);
} else {
System.out.println("Entering file: " + currentCount++);
pool.submit(new Reader(listOfStrings, fileEntry));
}
});
}
class Reader implements Runnable {
final List<String> listOfStrings;
final File file;
Reader(List<String> listOfStrings, File file) {
this.listOfStrings = listOfStrings;
this.file = file;
}
@Override
public void run() {
try {
List<String> lines = Files.readAllLines(Paths.get(file.getAbsolutePath()), StandardCharsets.UTF_8);
StringBuilder sb = new StringBuilder();
lines.forEach(s -> sb.append(s.toLowerCase()).append("\n"));
listOfStrings.forEach(s -> {
if (sb.indexOf(s.toLowerCase()) > 0) {
matchLocations.put(s, file.getPath());
}
});
} catch (IOException e) {
e.printStackTrace();
}
}
}
编辑
错误修复:
private final ConcurrentHashMap<String, List<String>> matchLocations = new ConcurrentHashMap<>();
private final ForkJoinPool pool = new ForkJoinPool();
private int currentCount = 0;
public void scanMyDirectory(final File myDirectory, final List<String> listOfStrings) {
File[] files = myDirectory.listFiles();
if (files == null) {
return;
}
Stream.of(files).forEach(fileEntry -> {
if (fileEntry.isDirectory()) {
scanMyDirectory(fileEntry, listOfStrings);
} else {
System.out.println("Entering file: " + currentCount++);
Reader reader = new Reader(listOfStrings, fileEntry);
pool.submit(reader);
}
});
}
class Reader implements Runnable {
final List<String> listOfStrings;
final File file;
Reader(List<String> listOfStrings, File file) {
this.listOfStrings = listOfStrings;
this.file = file;
}
@Override
public void run() {
try (FileInputStream fileInputStream = new FileInputStream(file);
FileChannel channel = fileInputStream.getChannel()) {
StringBuilder sb = new StringBuilder();
ByteBuffer buffer = ByteBuffer.allocate(512);
int read;
while (true) {
read = channel.read(buffer);
if (read == -1) {
break;
}
buffer.flip();
sb.append(new String(buffer.array()).toLowerCase());
buffer.clear();
}
listOfStrings.stream()
.map(String::toLowerCase)
.forEach(s -> {
if (sb.indexOf(s) > 0) {
List<String> current = matchLocations.get(s);
if (current == null) {
current = new ArrayList<>();
matchLocations.put(s, current);
}
current.add(file.getAbsolutePath());
}
});
} catch (IOException e) {
e.printStackTrace();
}
}
}