消除重复输入的最有效方法

时间:2019-12-24 11:35:37

标签: java performance arraylist set

我想将信息存储在 ArrayList 中。我从csv文件中获取数据,但是有相同的数据,我想消除它们。最有效的方法是什么?我考虑了两种方法:将所有数据添加到 Set 并将其转换为 ArrayList 。将它们添加到 ArrayList ,同时检查它们是否包含相同的数据。这是我的代码:

public static void sanitization(String file_path) throws FileNotFoundException, IOException {

    File file = new File(file_path);
    BufferedReader reader = new BufferedReader(new FileReader(file)); //read the csv file

    Set<Flight> flights_set = new HashSet<>(); //All valid flights will be added to set in order to prevent from adding same flights.

    String[] split = new String[31];
    String st;

    while ((st = reader.readLine()) != null) {
        split = st.split(",", -2);
        flights_set.add(new Flight(split[4], split[5], Integer.valueOf(split[11]), split[7], split[8], Integer.valueOf(split[0]), Integer.valueOf(split[1]), Integer.valueOf(split[2])));
    }

    //Second possible way
    /*while ((st = reader.readLine()) != null) {
        split = st.split(",", -2);
        Flight f=new Flight(split[4], split[5], Integer.valueOf(split[11]), split[7], split[8], Integer.valueOf(split[0]), Integer.valueOf(split[1]), Integer.valueOf(split[2]));

        if(!flights_arraylist.contains(f))
            flights_arraylist.add(f);
    }*/

    ArrayList<Flight> flights_arraylist = new ArrayList<>(flights_set);

}

class Flight implements Comparable<Flight> {

//All necessary information
public String airline;
public String flight_number;
public Integer departure_delay;
public String origin_airport_name;
public String destination_airport_name;
public Integer year;
public Integer month;
public Integer day;

//Constructor
public Flight(String airline, String flight_number, Integer departure_delay, String origin_airport_name, String destination_airport_name, Integer year, Integer month, Integer day) {
    this.airline = airline;
    this.flight_number = flight_number;
    this.departure_delay = departure_delay;
    this.origin_airport_name = origin_airport_name;
    this.destination_airport_name = destination_airport_name;
    this.year = year;
    this.month = month;
    this.day = day;
}

public Flight() {

}

//Flight is bigger if its departure delay is bigger
public int compareTo(Flight o) {
    if (this.departure_delay > o.departure_delay) return 1;
    else if (this.departure_delay < o.departure_delay) return -1;
    else return 0;
}

@Override
public boolean equals(Object obj) {
    Flight f = (Flight) obj;

    if ((this.airline.equals(f.airline)) && (this.flight_number.equals(f.flight_number)) && (this.departure_delay.equals(f.departure_delay)) && (this.origin_airport_name.equals(f.origin_airport_name)) && (this.destination_airport_name.equals(f.destination_airport_name)) && (this.year.equals(f.year)) && (this.month.equals(f.month)) && (this.day.equals(f.day))) {
        return true;
    }
    return false;

}

@Override
public int hashCode() {
    return 0;
}

@Override
public String toString() {
    return this.airline + " " + this.flight_number + " " + this.departure_delay;
}

}

这也是我的第一个问题,如果我有任何错误,请警告我

3 个答案:

答案 0 :(得分:1)

您可以使用流,下面是对列表进行处理的示例方法。

首先将所有元素添加到列表中,然后使用流并收集不同的元素并在同一列表中进行更新。

示例:

List<String> strList = new ArrayList<String>();
strList.add("Alpha");
strList.add("Beta");
strList.add("Charlie");
strList.add("Delta");
strList.add("Delta");
strList.add("Delta");

strList = strList.stream().distinct().collect(Collectors.toList());
System.out.println("Without duplicate");
strList.forEach(System.out::println);

输出:

Without duplicate
Alpha
Beta
Charlie
Delta

答案 1 :(得分:1)

来自java.util.Set#add的javadoc:@return如果此集合尚未包含指定的元素,则为true。 另外,对于此答案,BufferedReader提供了lines方法,该方法返回文件中的字符串流。 知道这一点,您可以编写如下内容:

    List<Flight> result;//list of your choice;
    Set<Flight> flightSet; //set of your choice;
    BufferedReader reader; // init bufferedReader
    reader.lines()
            .forEach(line -> {
                Flight flight;//transform into object;
                if (flightSet.add(flight)) {
                    result.add(flight);
                }
            });

或者完全使用流,收集不同的映射线:

BufferedReader reader; // init bufferedReader
reader.lines()
            .map(line->new Flight(/*... args*/))
            .distinct()
            .collect(Collectors.toList())

答案 2 :(得分:0)

为了最终避免重复,您需要搜索可用数据。

HashSet.contains()平均运行时间O(1)

但是,在内部,ArrayList使用indexOf(object)方法检查对象是否在列表中。 indexOf(object)方法迭代整个数组,并将每个元素与equals(object)方法进行比较。

回到复杂度分析,ArrayList.contains()方法需要O(n)时间。

最有效的方法是使用SET存储没有重复的内容,然后将其转换为List。