需要从文本文件中找到删除重复项,比较每行的第1和第5个字符串

时间:2015-11-01 15:14:03

标签: java

作为我正在处理的项目的一部分,我想清理一个生成重复行条目的文件。然而,这些重复通常不会彼此靠近。我想出了一个在Java中这样做的方法(它基本上在文件中找到了重复项,我在两个数组列表中存储了两个字符串并进行了迭代,但由于嵌套的for循环而无法正常进入该条件,因此无法正常工作。

但是,我需要一个集成的解决方案。最好是Java。有任何想法吗? 列表项

    public class duplicates {
        static BufferedReader reader = null;
        static BufferedWriter writer = null;
        static String currentLine;

        public static void main(String[] args) throws IOException {
            int count=0,linecount=0;;
            String fe = null,fie = null,pe=null;
            File file = new File("E:\\Book.txt");

            ArrayList<String> list1=new ArrayList<String>();
            ArrayList<String> list2=new ArrayList<String>();

            reader = new BufferedReader(new FileReader(file));

            while((currentLine = reader.readLine()) != null)
            {
                StringTokenizer st = new StringTokenizer(currentLine,"/");  //splits data into strings
                while (st.hasMoreElements()) {
                    count++;
                    fe=(String) st.nextElement();
                    //System.out.print(fe+"/// ");

                    //System.out.println("count="+count);
                    if(count==1){                                            //stores 1st string 
                        pe=fe;
                        //  System.out.println("first element "+fe);
                    }
                    else if(count==5){
                        fie=fe;                                              //stores 5th string
                        //  System.out.println("fifth element "+fie);
                    }
                }
                count=0;

                if(linecount>0){
                    for(String s1:list1)
                    {
                        for(String s2:list2){
                            if(pe.equals(s1)&&fie.equals(s2)){                              //checking condition
                                System.out.println("duplicate found");
                                //System.out.println(s1+ "   "+s2);
                            }        
                        }
                    }
                }                     
                list1.add(pe);
                list2.add(fie);
                linecount++;
            }
        }
    }

i/p:

/book1/_cwc/B737/customer/Special_Reports/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/book1/_cwc/B737/customer/Special_Reports/
/jangeer/_cwc/Crj_200/customer/plots/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/
/jangeer/_cwc/Crj_200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/01_Highlights/
/jangeer/_cwc/ERJ170/customer/01_Highlights/

o/p:

/book1/_cwc/B737/customer/Special_Reports/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/plots/
/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/
/jangeer/_cwc/Crj_200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/01_Highlights/

2 个答案:

答案 0 :(得分:1)

public static void removeDups() {
        String[] input = new String[] { //Lets say you read whole file in this string array
                "/book1/_cwc/B737/customer/Special_Reports/",
                "/Airbook/_cwc/A330-200/customer/02_Watchlists/",
                "/book1/_cwc/B737/customer/Special_Reports/",
                "/jangeer/_cwc/Crj_200/customer/plots/",
                "/Airbook/_cwc/A330-200/customer/02_Watchlists/",
                "/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/",
                "/jangeer/_cwc/Crj_200/customer/02_Watchlists/",
                "/jangeer/_cwc/Crj_200/customer/01_Highlights/",
                "/jangeer/_cwc/ERJ170/customer/01_Highlights/"
        };
        ArrayList<String> outPut = new ArrayList<>(); //The array list for storing output i.e. distincts.
        Arrays.stream(input).distinct().forEach(x -> outPut.add(x)); //using java 8 and stream you get distinct from input
        outPut.forEach(System.out::println); //I will write back to the file, just for example I am printing out everything but you can write back the output to file using your own implementation.
    }

运行此方法时的输出是

/book1/_cwc/B737/customer/Special_Reports/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/plots/
/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/
/jangeer/_cwc/Crj_200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/01_Highlights/
/jangeer/_cwc/ERJ170/customer/01_Highlights/

修改

非Java 8回答

public static void removeDups() {
        String[] input = new String[] {
                "/book1/_cwc/B737/customer/Special_Reports/",
                "/Airbook/_cwc/A330-200/customer/02_Watchlists/",
                "/book1/_cwc/B737/customer/Special_Reports/",
                "/jangeer/_cwc/Crj_200/customer/plots/",
                "/Airbook/_cwc/A330-200/customer/02_Watchlists/",
                "/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/",
                "/jangeer/_cwc/Crj_200/customer/02_Watchlists/",
                "/jangeer/_cwc/Crj_200/customer/01_Highlights/",
                "/jangeer/_cwc/ERJ170/customer/01_Highlights/"
        };

        LinkedHashSet<String> output = new LinkedHashSet<String>(Arrays.asList(input)); //output is your set of unique strings in preserved order

    }

答案 1 :(得分:1)

使用Set<String>代替Arraylist<String>

Set中不允许重复,所以如果你只是为它添加每行,然后让它们退出,你将拥有所有不同的字符串。

性能方面它也比你的嵌套for循环更快。