亲爱的开发人员我正在做一个java程序,它逐行比较两个文本文件,第一个文本文件有99,000行,另一个文件有1,15,000行。我想读取文件并以这种方式进行比较,以便如果第一个文件和第二个文件之间的任何行匹配,它应该打印匹配。我已经编写了代码,但由于for循环,它需要大约10分钟才能完成打印。如何使其快速,高效和内存优化。如何让它快速执行?请指导我。感谢
public class Main {
static final String file1 = "file1.txt";
static final String file2 = "file2.txt";
static BufferedReader b1 = null;
static BufferedReader b2 = null;
static List<String> list_file1 = null;
static List<String> list_file2 = null;
public static void main(String[] args) {
list_file1 = new ArrayList<String>();
list_file2 = new ArrayList<String>();
String lineText = null;
try {
b1 = new BufferedReader(new FileReader(file1));
while ((lineText = b1.readLine()) != null) {
list_file1.add(lineText);
}
b2 = new BufferedReader(new FileReader(file2));
while ((lineText = b2.readLine()) != null) {
list_file2.add(lineText);
}
compareFile(list_file1,list_file2);
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
private static void compareFile(List<String> list_file1, List<String> list_file2) {
for(String content1:list_file1){
for(String content2:list_file2){
if(content1.equals(content2)){
System.out.println("Match Found:-"+content1);
}
}
}
}
}
答案 0 :(得分:0)
使用HashSet
及其contains
方法:
public class Main {
static final String file1 = "/tmp/file1";
static final String file2 = "/tmp/file2";
static BufferedReader b1 = null;
static BufferedReader b2 = null;
static Set<String> list_file1 = null;
public static void main(String[] args) {
list_file1 = new HashSet<>();
String lineText = null;
try {
b1 = new BufferedReader(new FileReader(file1));
while ((lineText = b1.readLine()) != null) {
list_file1.add(lineText);
}
b2 = new BufferedReader(new FileReader(file2));
while ((lineText = b2.readLine()) != null) {
if (list_file1.contains(lineText)) {
System.out.println("Match Found:-" + lineText);
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
如果文件有重复项,则可以使用HashMap
代替:
public class Main {
static final String file1 = "/tmp/file1";
static final String file2 = "/tmp/file2";
static BufferedReader b1 = null;
static BufferedReader b2 = null;
static HashMap<String, Integer> list_file1 = null;
public static void main(String[] args) {
list_file1 = new HashMap<>();
String lineText = null;
try {
b1 = new BufferedReader(new FileReader(file1));
while ((lineText = b1.readLine()) != null) {
if (!list_file1.containsKey(lineText))
list_file1.put(lineText, 1);
else
list_file1.put(lineText, list_file1.get(lineText) + 1);
}
b2 = new BufferedReader(new FileReader(file2));
while ((lineText = b2.readLine()) != null) {
if (list_file1.containsKey(lineText)) {
for (int i = 0; i < list_file1.get(lineText); i++) {
System.out.println("Match Found:-" + lineText);
}
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
答案 1 :(得分:0)
尝试以下内容。我删除了你的代码的一次迭代仍然可以有重复。在这里我使用java 8 stream。
public static void main(String[] args) {
final String file1 = "file1.txt";
final String file2 = "file2.txt";
BufferedReader b1;
BufferedReader b2;
List<String> list_file1 = new ArrayList<>();
String lineText;
try {
b1 = new BufferedReader(new FileReader(file1));
while ((lineText = b1.readLine()) != null) {
list_file1.add(lineText);
}
b2 = new BufferedReader(new FileReader(file2));
while ((lineText = b2.readLine()) != null) {
final String text = lineText;
list_file1.stream().filter(s -> s.equalsIgnoreCase(text)).forEach(s -> System.out.println("Match Found:-" + text));
}
}
catch (IOException e) {
e.printStackTrace();
}
}
答案 2 :(得分:0)
以下按照与您自己的代码相同的顺序打印行:即,如果文件1的行中包含“one”,“two”,而文件2的行中包含“two”,“one”顺序,然后输出将是“一”,“两”。为此,我们首先阅读文件2并构建行的映射和每行的出现次数:
static void printDuplicateLines(String filename1, String filename2) throws IOException {
// Index the lines of file 2 with a map of line -> count
Map<String, Integer> linesOfFile2 = new HashMap<>();
try (Stream<String> lines = Files.lines(Paths.get(filename2))) {
lines.forEach(line -> linesOfFile2.merge(line, 1, (oldValue, x) -> oldValue + 1));
}
// Check file 1 to see which lines are duplicate
try (Stream<String> lines = Files.lines(Paths.get(filename1))) {
lines.forEach(line -> {
int countOccurrencesInFile2 = linesOfFile2.getOrDefault(line, 0);
for (int i = 1; i <= countOccurrencesInFile2; i++)
System.out.println("Match Found:-" + line);
}
);
}
}
然后我们逐行读取文件1,找出文件2中该行的出现次数(如果没有,则为0)并多次打印该行。
请注意使用try-with-resources确保文件正确关闭。