我已经编写了一些代码,几乎就我希望它的运行方式而言。这个Java代码的逻辑如下:
以下是代码:
package preproc;
import java.io.*;
import java.util.*;
import java.util.regex.*;
public class Preproc {
public static void main(String[] args) {
File file = new File("C:\\Users\\AnthonyH\\Desktop\\file.txt");
BufferedReader br;
HashMap<String, Integer> hmap = new HashMap<>();
try {
br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
int linenumber = 0;
String event;
while ((event = br.readLine()) != null) {
//System.out.println("LINE=" + event);
Pattern regex = Pattern.compile("^.*url=(.*)");
Matcher check = regex.matcher(event);
if (check.find()) {
String match = check.group(1);
//System.out.println("GROUP=" + match + " LINE=" + linenumber);
if (!hmap.containsKey(match)) {
//System.out.println("ADDING TO INDEX");
hmap.put(match, linenumber);
}
}
linenumber++;
}
List<Integer> lineNumbers = new ArrayList<>(hmap.values());
//System.out.println("SIZE=" + lineNumbers.size());
Collections.sort(lineNumbers);
File file2 = new File("C:\\Users\\AnthonyH\\Desktop\\file2.txt");
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file2)));
int currentLine = 0;
for (Integer line : lineNumbers) {
//System.out.println("LINE=" + line + "CURRENT LINE=" + currentLine);
while (currentLine < line) {
reader.readLine();
currentLine++;
}
writer.write(reader.readLine());
writer.newLine();
currentLine++;
}
writer.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
我面临的问题是它正在将所有唯一的字符串匹配写入HashMap,当我只想添加在原始文件中出现一次的那些时。 I.E. site1.com的五个实例和site2.com的一个实例,该地图将具有site1.com的第一个实例和site2.com的唯一实例。我只想要site2.com。
非常感谢所有帮助。
答案 0 :(得分:0)
创建Map<String, Occurrence>
,其中Occurrence
包含(第一个)行号以及URL的出现次数。在写入时,忽略出现次数为&gt;的行。 1.
这是一种方式,还有其他方式。
您可以拥有至少两次符合的Set
个网址。只要找到地图中已有的URL,就会将其添加到集合中。写入时,您将忽略集合中的URL。
请注意,如果文件太大,您可以将行存储在内存中而不是重新读取文件。
答案 1 :(得分:0)
package preproc;
import java.io.*;
import java.util.*;
import java.util.regex.*;
public class Preproc {
public static void main(String[] args) {
File file = new File("C:\\Users\\AnthonyH\\Desktop\\file.txt");
BufferedReader br;
HashMap<String, List<Integer>> hmap = new LinkedHashMap<String, List<Integer>>();
try {
br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
int linenumber = 0;
String event;
while ((event = br.readLine()) != null) {
Pattern regex = Pattern.compile("^.*url=(.*)");
Matcher check = regex.matcher(event);
if (check.find()) {
String match = check.group(1);
List<Integer> lineNumbers = new ArrayList<Integer>();
if (hmap.containsKey(match)) {
lineNumbers = hmap.get(match);
}
lineNumbers.add(linenumber);
hmap.put(match, lineNumbers);
}
linenumber++;
}
List<List<Integer>> lineNumbers = new ArrayList<List<Integer>>(hmap.values());
File file2 = new File("C:\\Users\\AnthonyH\\Desktop\\file2.txt");
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file2)));
for (List<Integer> linesOccurences : lineNumbers) {
int currentLine = 0;
if(linesOccurences.size() == 1)
{
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
int line = linesOccurences.get(1);
while (currentLine++ < line) {
reader.readLine();
}
writer.write(reader.readLine());
writer.newLine();
reader.close();
}
}
writer.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
试试这个编辑过的代码。在前一个中,BufferedReader对象不在正确的位置。