我在工作区中保存的4个文件a,b,c,d中有一堆id。我想在一个文件merged.txt中以排序顺序合并所有这些id。它们将作为字符串每行保存一个。我可以通过将文件带入内存来单独排序文件。但是如何合并它们,可能会有重复的条目。我想不出如何比较四个文件中的每个条目(它们可以增长到8,所以不能硬编码)。特别是如何比较条目以及如何仅推进那些条目中最小的文件指针?
public void sortFile() throws IOException
{
File a = new File("/Users/phoenix/workspace/data/a.txt");
File b = new File("/Users/phoenix/workspace/data/b.txt");
File c = new File("/Users/phoenix/workspace/data/c.txt");
File d = new File("/Users/phoenix/workspace/data/d.txt");
doSort(a);
doSort(b);
doSort(c);
doSort(d);
merge();
}
如何根据下面的伪代码修改合并方法?
public void merge()
{
File dir = new File("/Users/phoenix/workspace/data");
for(File f: dir.listFiles())
{
// toDo: merge into a single file merged.txt
}
}
public void doSort(File f) throws IOException
{
BufferedReader reader = new BufferedReader(new FileReader(f));
String line;
ArrayList<String> list = new ArrayList<String>();
while((line = reader.readLine())!=null)
{
list.add(line);
}
Collections.sort(list);
PrintWriter out = new PrintWriter(f);
for(String s:list)
out.println(s);
reader.close();
out.close();
}
public void merge() throws IOException
{
File dir = new File("/Users/phoenix/workspace/data");
File merged = new File("/Users/phoenix/workspace/data/merged.txt");
ArrayList<BufferedReader> readers = new ArrayList<BufferedReader>(dir.listFiles().length);
ArrayList<String> list = new ArrayList<String>();
PrintWriter out = new PrintWriter(merged);
for(File f: dir.listFiles())
{
readers.add(new BufferedReader(new FileReader(f)));
}
while(true)
{
for (BufferedReader reader: readers)
{
if(reader.readLine()!=null)
list.add(reader.readLine());
else
{
reader.close();
}
}
String min = Collections.min(list);
int index = list.indexOf(min);
out.write(min);
}
}
答案 0 :(得分:2)
您是要解决问题,还是用Java解决问题。
如果您只是在寻找方法,并且可以访问终端,并且通过“排序”意味着按字母顺序排序,您可以更简单地进行排序。
cat "/Users/phoenix/workspace/data/a.txt" "/Users/phoenix/workspace/data/b.txt" "/Users/phoenix/workspace/data/c.txt" "/Users/phoenix/workspace/data/d.txt"|sort > merged.txt
用于排序并仅拾取uniq
cat "/Users/phoenix/workspace/data/a.txt" "/Users/phoenix/workspace/data/b.txt" "/Users/phoenix/workspace/data/c.txt" "/Users/phoenix/workspace/data/d.txt"|sort |uniq > merged.txt
更新: 顺便说一下,要用数字排序,请使用
sort -n
答案 1 :(得分:1)
以下是该算法的一般描述:
在使用算法之前,您的代码需要检查是否存在至少一个输入文件;否则,您的代码应退出。
编辑:您的merge
代码与上述算法看起来不太相似;这里有一些代码可以帮助您入门:
// Prepare your readers and their top items
for(File f: dir.listFiles()) {
BufferedReader br = new BufferedReader(new FileReader(f));
String firstLine = reader.readLine();
// Your code inserts buffered readers unconditionally;
// You should not insert readers for empty files.
if (firstLine != null) {
readers.add(br);
list.add(firstLine);
} else {
br.close();
}
}
// Stop when the last reader is removed
while (!readers.isEmpty()) {
int minIndex = ... // Find the index of the smallest item in the "list"
out.write(list.get(minIndex));
BufferedReader br = readers.get(minIndex);
String next = br.readLine();
if (next != null) {
list.set(minIndex, next);
} else {
br.close();
list.remove(minIndex);
readers.remove(minIndex);
}
}
答案 2 :(得分:0)
将每个文件读入List
List<String> list1 = Files.readAllLines(Path.get(path), StandardCharsets.UTF_8);
...
将lists1合并到一个列表中
List<String> list = new ArrayList<>();
list.addAll(list1);
...
现在对行进行排序
Collections.sort(list);
并将它们写入单个文件。
注意:如果您不希望重复行使用TreeSet而不是ArrayList