Question

我有两个Arraylists，A和B.

ArrayList A由一组数据组成，包括一个名为categoryID的标识符。 A中的多个项目可以具有相同的categoryID。对于A：[1, 1, 2, 2, 3, 4, 7]中的每个项目，CategoryID可以如下所示。

ArrayList B由包含不同数据集的不同类组成，包括categoryID。 categoryID对于此列表中的每个项目都是唯一的。示例：[1, 2, 3, 4, 5, 6, 7]。

两个列表都按categoryID排序，这有望使这更容易。

我想要做的是提出一个新的列表C，它包含listB中至少与listA至少有一个交集的项目。因此，列表C应包含上面给定输入中的项[1, 2, 3, 4, 7]。

到目前为止，我的策略是迭代两个列表。我不相信这是最有效的方法，所以我想问我能看到的其他选择是什么。

我的方法：

ArrayList<classB> results = new ArrayList<classB>();
for (classA itemA : listA){
  int categoryID = item.categoryID;
  for (classB itemB : listB){
    if (itemB.categoryID == categoryID){
      if (!results.contains(itemB)){
        results.add(itemB);
      }
      break;
    }
  }
}

我首先遍历列表A，抓取categoryID，然后遍历listB以找到匹配的categoryID。当我找到它时，我检查结果列表是否包含listB中的这个项目。如果没有，那么我将它添加到结果中并突破内部for循环并继续通过listA。如果结果列表已经包含itemB，那么我将简单地打破内部for循环并继续浏览listA。这个方法是O（n ^ 2），对于大型数据集来说不是很好。有什么想法可以改进吗？

Answer 1

将ListA中的所有类别ID添加到Set，我们将其称为setACategories。然后，遍历ListB，如果setACategories包含ListB中元素的categoryID，则将ListB的元素添加到results。

results也应该是Set，因为看起来您只希望listB中的一个匹配进入results而不是多个匹配（允许您避免调用{ {1}}。

Answer 2

将listA中的categoryID值添加到Set中，然后遍历列表，选择其categoryId位于您的集合中的那些元素。

Answer 3

现在最好的方法是使用java流：

List<foo> list1 = new ArrayList<>(Arrays.asList(new foo(), new foo()));
List<foo> list2 = new ArrayList<>(Arrays.asList(new foo(), new foo()));
list1.stream().filter(f -> list2.contains(f)).collect(Collectors.toList());

但是，我自己使用apache commons库来处理这类事情：

https://commons.apache.org/proper/commons-collections/javadocs/api-3.2.1/org/apache/commons/collections/CollectionUtils.html

Answer 4

你试过了吗？

public void test() {
    Collection c1 = new ArrayList();
    Collection c2 = new ArrayList();

    c1.add("Text 1");
    c1.add("Text 2");
    c1.add("Text 3");
    c1.add("Text 4");
    c1.add("Text 5");

    c2.add("Text 3");
    c2.add("Text 4");
    c2.add("Text 5");
    c2.add("Text 6");
    c2.add("Text 7");

    c1.retainAll(c2);

    for (Iterator iterator = c1.iterator(); iterator.hasNext();) {
        Object next = iterator.next();
        System.out.println(next);  //Output: Text 3, Text 4, Text 5
    }
}

Answer 5

尝试使用Google Guava中的Sets.intersection(Set<E> set1,Set<?> set2)。

当然，您可以使用Sets.newHashSet(Iterable<? extends E> elements)

将数组转换为集合

Answer 6

请参阅以下代码。我已经实现了一个交集，它使用了它们的排序以改进顶部答案的方法。

它有点像合并排序中的合并步骤，除了它确保交叉点。它可能会进一步改进，我在30分钟内写完了。

使用当前数据，它的运行速度比最高答案快17倍。它还节省了O（n）内存，因为它只需要一组

另见：The intersection of two sorted arrays

import java.util.*;

public class test {
    public static void main (String[] args) {
        List<Integer> a1 = new ArrayList<Integer>();
        List<Integer> a2 = new ArrayList<Integer>();
        Random r = new Random();

        for(int i = 0; i < 1000000; i++) {
            a1.add(r.nextInt(1000000));
            a2.add(r.nextInt(1000000));
        }

        Collections.sort(a1);
        Collections.sort(a2);

        System.out.println("Starting");

        long t1 = System.currentTimeMillis();
        Set<Integer> set1 = func1(a1, a2);
        long t2 = System.currentTimeMillis();

        System.out.println("Func1 done in: " + (t2-t1) + " milliseconds.");

        long t3 = System.currentTimeMillis();
        Set<Integer> set2 = func2(a1, a2);
        long t4 = System.currentTimeMillis();

        System.out.println("Func2 done in: " + (t4-t3) + " milliseconds.");

        if(set1.size() != set2.size()) {
            System.out.println("ERROR - sizes not equal");
            System.exit(1);
        }

        for(Integer t : set1) {
            if (!set2.contains(t)) {
                System.out.println("ERROR");
                System.exit(1);
            }
        }
    }

    public static Set<Integer> func1(List<Integer> a1, List<Integer> a2) {
        Set<Integer> intersection = new HashSet<Integer>();

        int index = 0;
        for(Integer a : a1) {

            while( index < a2.size() && a2.get(index) < a) {
                index++;
            } 

            if(index == a2.size()) { 
                break;
            }
            if (a2.get(index).equals(a)) {
                intersection.add(a);
            } else {
                continue;
            }

        }

        return intersection;
    }

    public static Set<Integer> func2(List<Integer> a1, List<Integer> a2) {
        Set<Integer> intersection = new HashSet<Integer>();
        Set<Integer> tempSet = new HashSet<Integer>();
        for(Integer a : a1) {
            tempSet.add(a);
        }

        for(Integer b : a2) {
            if(tempSet.contains(b)) {
                intersection.add(b);
            }
        }

        return intersection;
    }
}

从两个不同的ArrayLists中找到独特交叉点的最有效方法？

6 个答案: