作为我正在处理的项目的一部分,我想清理一个生成重复行条目的文件。然而,这些重复通常不会彼此靠近。我想出了一个在Java中这样做的方法(它基本上在文件中找到了重复项,我在两个数组列表中存储了两个字符串并进行了迭代,但由于嵌套的for循环而无法正常进入该条件,因此无法正常工作。
但是,我需要一个集成的解决方案。最好是Java。有任何想法吗? 列表项
public class duplicates {
static BufferedReader reader = null;
static BufferedWriter writer = null;
static String currentLine;
public static void main(String[] args) throws IOException {
int count=0,linecount=0;;
String fe = null,fie = null,pe=null;
File file = new File("E:\\Book.txt");
ArrayList<String> list1=new ArrayList<String>();
ArrayList<String> list2=new ArrayList<String>();
reader = new BufferedReader(new FileReader(file));
while((currentLine = reader.readLine()) != null)
{
StringTokenizer st = new StringTokenizer(currentLine,"/"); //splits data into strings
while (st.hasMoreElements()) {
count++;
fe=(String) st.nextElement();
//System.out.print(fe+"/// ");
//System.out.println("count="+count);
if(count==1){ //stores 1st string
pe=fe;
// System.out.println("first element "+fe);
}
else if(count==5){
fie=fe; //stores 5th string
// System.out.println("fifth element "+fie);
}
}
count=0;
if(linecount>0){
for(String s1:list1)
{
for(String s2:list2){
if(pe.equals(s1)&&fie.equals(s2)){ //checking condition
System.out.println("duplicate found");
//System.out.println(s1+ " "+s2);
}
}
}
}
list1.add(pe);
list2.add(fie);
linecount++;
}
}
}
i/p:
/book1/_cwc/B737/customer/Special_Reports/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/book1/_cwc/B737/customer/Special_Reports/
/jangeer/_cwc/Crj_200/customer/plots/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/
/jangeer/_cwc/Crj_200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/01_Highlights/
/jangeer/_cwc/ERJ170/customer/01_Highlights/
o/p:
/book1/_cwc/B737/customer/Special_Reports/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/plots/
/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/
/jangeer/_cwc/Crj_200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/01_Highlights/
答案 0 :(得分:1)
public static void removeDups() {
String[] input = new String[] { //Lets say you read whole file in this string array
"/book1/_cwc/B737/customer/Special_Reports/",
"/Airbook/_cwc/A330-200/customer/02_Watchlists/",
"/book1/_cwc/B737/customer/Special_Reports/",
"/jangeer/_cwc/Crj_200/customer/plots/",
"/Airbook/_cwc/A330-200/customer/02_Watchlists/",
"/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/",
"/jangeer/_cwc/Crj_200/customer/02_Watchlists/",
"/jangeer/_cwc/Crj_200/customer/01_Highlights/",
"/jangeer/_cwc/ERJ170/customer/01_Highlights/"
};
ArrayList<String> outPut = new ArrayList<>(); //The array list for storing output i.e. distincts.
Arrays.stream(input).distinct().forEach(x -> outPut.add(x)); //using java 8 and stream you get distinct from input
outPut.forEach(System.out::println); //I will write back to the file, just for example I am printing out everything but you can write back the output to file using your own implementation.
}
运行此方法时的输出是
/book1/_cwc/B737/customer/Special_Reports/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/plots/
/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/
/jangeer/_cwc/Crj_200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/01_Highlights/
/jangeer/_cwc/ERJ170/customer/01_Highlights/
修改强>
非Java 8回答
public static void removeDups() {
String[] input = new String[] {
"/book1/_cwc/B737/customer/Special_Reports/",
"/Airbook/_cwc/A330-200/customer/02_Watchlists/",
"/book1/_cwc/B737/customer/Special_Reports/",
"/jangeer/_cwc/Crj_200/customer/plots/",
"/Airbook/_cwc/A330-200/customer/02_Watchlists/",
"/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/",
"/jangeer/_cwc/Crj_200/customer/02_Watchlists/",
"/jangeer/_cwc/Crj_200/customer/01_Highlights/",
"/jangeer/_cwc/ERJ170/customer/01_Highlights/"
};
LinkedHashSet<String> output = new LinkedHashSet<String>(Arrays.asList(input)); //output is your set of unique strings in preserved order
}
答案 1 :(得分:1)
使用Set<String>
代替Arraylist<String>
。
Set中不允许重复,所以如果你只是为它添加每行,然后让它们退出,你将拥有所有不同的字符串。
性能方面它也比你的嵌套for循环更快。