Question

我在列表中有一百万个员工对象。员工对象在处理过程中被修改并进行了多次排序。同样，在处理时，需要按部门获取Employee对象。意思是，我必须维护一个“部门”为“键”且“列表”为值的地图。

在处理过程中，RAM最高可拍摄100+ GB。而文件的雇员本身大约为2 GB。

列表是主列表，而地图是为了方便使用（按部门获取）。

现在，我的问题是：如何避免在列表和地图中复制Employee对象？每当调用getByDept时，如果我遍历List既昂贵又费时。

我需要一个带有列表的地图作为后备数据。对列表中的Employee对象所做的任何更改也应反映在地图中。

有什么想法可以在不复制Map中的Employee对象的情况下建立数据结构？

谢谢。

Answer 1

尝试使用Set代替List：

不包含重复元素的集合

在集合中添加两次相同的元素不会更改集合。

请记住定义方法equals（和hashcode）以正确使用Set，因为Set方法在内部使用它们。

Answer 2

将您的列表添加到一个空集中，这将删除您的员工列表中所有重复的元素。您可以再次将集合转换回这样的列表。以下使用set从List中删除重复的元素。然后使用不重复的员工，您的搜索将变得更快。

      Map<Integer, List> employeeMap = new HashMap<Integer, List>();
        Set<String> set = new HashSet<>();
        List<String> list = new ArrayList<>();
        list.add("Allen");
        list.add("Alder");
        list.add("Allen");
        set.addAll(list);
        //This removes duplicate
        List<String> employeeList = new ArrayList<>();
        employeeList.addAll(set);
        //Add to the Map
        employeeMap.put(1, employeeList);
        //the way to edit the list directly, but you cannot check for duplicate elemets here
        employeeMap.get(1).add("werner");        
        // Instead of List use Set
        Map<Integer, Set> employeeMapUsingSet = new HashMap<Integer, Set>();
        Set<String> employeeSet = new TreeSet<>();
        List<String> employeeLists = new ArrayList<>();
        list.add("Allen");
        list.add("Alder");
        list.add("Allen");
        employeeSet.addAll(list);                
        //Add to the set to Map
        employeeMapUsingSet.put(1, employeeSet);
        //the way to edit the list directly, but you cannot check for duplicate elemets here
        employeeMapUsingSet.get(1).add("werner");
        //the following Duplicate will not be allowed and Map also gets updated, so no need for reference variable
        employeeMapUsingSet.get(1).add("Alder");
        //Again add Nancy to the set added in map and print
        employeeSet.add("Nancy");
        //This will print Nancy with the result too, because we are stroing a copy of pointer in the map, so the changes get reflected
        System.out.println(employeeMapUsingSet.get(1));

尝试此操作，因为我们在地图中存储了员工集的指针，对员工集所做的任何更改也将反映在地图上。 TreeSet已用于排序目的。在您再次将employeeSet分配给新的TreeSet之前，地址将是相同的，因此所做的任何更改也将反映在地图中。希望这会有所帮助

Answer 3

“多次排序”可能是内存使用量的来源，具体取决于您的处理方式。

例如每次排序时，List.sort都会创建一个包含所有列表元素的新数组。

Answer 4

如何避免在“列表与地图”中复制Employee对象？每当调用getByDept时，如果我遍历List就会很昂贵和耗时。

映射和列表包含JVM创建的堆内存中对象的引用。因此，一旦您从列表中添加或删除了一些员工，由于地图指向相同的列表，您将从地图中获取更新的列表。下面是说明此事实的示例：

import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;

public class Demo {
    public static void main(String args[]) {
        HashMap<Integer, List<Employee>> hashMap = new HashMap<>();
        List<Employee> empList1 = new ArrayList<Employee>();
        empList1.add(new Employee(3,"c",13));
        empList1.add(new Employee(2, "b", 12));
        empList1.add(new Employee(1, "a", 11));

        List<Employee> empList2 = new ArrayList<Employee>();
        empList2.add(new Employee(6,"f",16));
        empList2.add(new Employee(5, "e", 15));
        empList2.add(new Employee(4, "d", 14));

        hashMap.put(101, empList1);
        hashMap.put(102, empList2);

        System.out.println("Before::::::::::");
        hashMap.forEach((x, y)-> System.out.println(x + " " + y));

        List<Employee> list = hashMap.get(101);
        list.add(new Employee(10, "z", 18));

        System.out.println("After::::::::::");
        hashMap.forEach((x, y)-> System.out.println(x + " " + y));
    }
}

class Employee {
    int id;
    String name;
    int age;
    public Employee(int id, String name, int age) {
        this.id = id;
        this.name = name;
        this.age = age;
    }

    @Override
    public String toString() {
        return id + " : " + name + " : " + age;
    }
}

在不复制员工的情况下构建数据结构的任何想法地图中的对象？

由于您需要删除重复项并对列表进行多次排序，因此更好的方法是使用TreeSet。

使用TreeSet的优势在于，您将获得不同的员工，而且您的员工也将按排序的顺序排列。

数据结构-避免在列表和地图中重复

4 个答案: