Question

我有两个相同对象的集合，Collection<Foo> oldSet和Collection<Foo> newSet。所需的逻辑如下：

如果foo位于（*）oldSet但不是newSet，请致电doRemove(foo)
如果foo不在oldSet而在newSet，请致电doAdd(foo)
如果foo在两个集合中但已修改，则调用doUpdate(oldFoo, newFoo)
如果!foo.activated && foo.startDate >= now，请致电doStart(foo)
如果foo.activated && foo.endDate <= now，请致电doEnd(foo)

（*）“in”表示唯一标识符匹配，不一定是内容。

当前（旧版）代码进行了许多比较，以确定removeSet，addSet，updateSet，startSet和endSet，然后循环执行在每个项目上。

代码非常混乱（部分是因为我已经遗漏了一些意大利面条逻辑）而我正在尝试重构它。更多背景信息：

据我所知，oldSet和newSet实际上由ArrayList支持
每组包含的项目少于100件，最多可能是20件
此代码经常被调用（以百万/天为单位），尽管这些代码很少不同

我的问题：

如果我将oldSet和newSet转换为HashMap<Foo>（此处不关注顺序），将ID作为键，会使代码更易于阅读并更易于比较？多少时间＆amp;内存性能是转换损失吗？
迭代这两组并执行适当的操作会更有效和简洁吗？

Answer 1

Apache的commons.collections库有一个CollectionUtils类，它为Collection操作/检查提供了易于使用的方法，例如交集，差异和联合。

org.apache.commons.collections.CollectionUtils API文档为here。

Answer 2

您可以使用Java 8流，例如

set1.stream().filter(s -> set2.contains(s)).collect(Collectors.toSet());

来自Sets的

或Guava课程：

Set<String> intersection = Sets.intersection(set1, set2);
Set<String> difference = Sets.difference(set1, set2);
Set<String> symmetricDifference = Sets.symmetricDifference(set1, set2);
Set<String> union = Sets.union(set1, set2);

Answer 3

我已经使用Java中的Collections Framework创建了我认为您正在寻找的近似值。坦率地说，我认为这可能是过度的，因为@Mike Deck指出。对于这样一小组要比较和处理的项目，我认为数组从程序角度来看是更好的选择，但这里是我的伪编码（因为我很懒）解决方案。我假设Foo类基于它的唯一id而不是它的内容中的所有数据是可比较的：

Collection<Foo> oldSet = ...;
Collection<Foo> newSet = ...;

private Collection difference(Collection a, Collection b) {
    Collection result = a.clone();
    result.removeAll(b)
    return result;
}

private Collection intersection(Collection a, Collection b) {
    Collection result = a.clone();
    result.retainAll(b)
    return result;
}

public doWork() {
    // if foo is in(*) oldSet but not newSet, call doRemove(foo)
    Collection removed = difference(oldSet, newSet);
    if (!removed.isEmpty()) {
        loop removed {
            Foo foo = removedIter.next();
            doRemove(foo);
        }
    }
    //else if foo is not in oldSet but in newSet, call doAdd(foo)
    Collection added = difference(newSet, oldSet);
    if (!added.isEmpty()) {
        loop added  {
            Foo foo = addedIter.next();
            doAdd(foo);
        }
    }

    // else if foo is in both collections but modified, call doUpdate(oldFoo, newFoo)
    Collection matched = intersection(oldSet, newSet);
    Comparator comp = new Comparator() {
        int compare(Object o1, Object o2) {
            Foo f1, f2;
            if (o1 instanceof Foo) f1 = (Foo)o1;
            if (o2 instanceof Foo) f2 = (Foo)o2;
            return f1.activated == f2.activated ? f1.startdate.compareTo(f2.startdate) == 0 ? ... : f1.startdate.compareTo(f2.startdate) : f1.activated ? 1 : 0;
        }

        boolean equals(Object o) {
             // equal to this Comparator..not used
        }
    }
    loop matched {
        Foo foo = matchedIter.next();
        Foo oldFoo = oldSet.get(foo);
        Foo newFoo = newSet.get(foo);
        if (comp.compareTo(oldFoo, newFoo ) != 0) {
            doUpdate(oldFoo, newFoo);
        } else {
            //else if !foo.activated && foo.startDate >= now, call doStart(foo)
            if (!foo.activated && foo.startDate >= now) doStart(foo);

            // else if foo.activated && foo.endDate <= now, call doEnd(foo)
            if (foo.activated && foo.endDate <= now) doEnd(foo);
        }
    }
}

至于你的问题：如果我将oldSet和newSet转换为HashMap（此处不关注顺序），将ID作为键，是否会使代码更易于阅读并更容易比较？多少时间＆amp;内存性能是转换损失？我认为你可能会通过使用Map BUT使代码更具可读性......你可能会在转换过程中使用更多的内存和时间。

迭代这两组并执行适当的操作会更高效和简洁吗？是的，这将是两全其美的，特别是如果您遵循@Mike Sharek的建议，使用专门的方法滚动您自己的列表，或者遵循访客设计模式，以贯穿您的集合并处理每个项目。

Answer 4

我会移动到列表并以这种方式解决：

如果列表中的对象不是Comparator

Comparable

迭代两个列表中的元素，如merge sort algorithm中的合并阶段，但不是合并列表，而是检查逻辑。

代码或多或少会像这样：

/* Main method */
private void execute(Collection<Foo> oldSet, Collection<Foo> newSet) {
  List<Foo> oldList = asSortedList(oldSet);
  List<Foo> newList = asSortedList(newSet);

  int oldIndex = 0;
  int newIndex = 0;
  // Iterate over both collections but not always in the same pace
  while( oldIndex < oldList.size() 
      && newIndex < newIndex.size())  {
    Foo oldObject = oldList.get(oldIndex);
    Foo newObject = newList.get(newIndex);

    // Your logic here
    if(oldObject.getId() < newObject.getId()) {
      doRemove(oldObject);
      oldIndex++;
    } else if( oldObject.getId() > newObject.getId() ) {
      doAdd(newObject);
      newIndex++;
    } else if( oldObject.getId() == newObject.getId() 
            && isModified(oldObject, newObject) ) {
      doUpdate(oldObject, newObject);
      oldIndex++;
      newIndex++;
    } else {
      ... 
    }
  }// while

  // Check if there are any objects left in *oldList* or *newList*

  for(; oldIndex < oldList.size(); oldIndex++ ) {
    doRemove( oldList.get(oldIndex) );  
  }// for( oldIndex )

  for(; newIndex < newList.size(); newIndex++ ) {
    doAdd( newList.get(newIndex) );
  }// for( newIndex ) 
}// execute( oldSet, newSet )

/** Create sorted list from collection 
    If you actually perform any actions on input collections than you should 
    always return new instance of list to keep algorithm simple.
*/
private List<Foo> asSortedList(Collection<Foo> data) {
  List<Foo> resultList;
  if(data instanceof List) {
     resultList = (List<Foo>)data;
  } else {
     resultList = new ArrayList<Foo>(data);
  }
  Collections.sort(resultList)
  return resultList;
}

Answer 5

我认为最简单的方法是使用apache collections api - CollectionUtils.subtract（list1，list2），只要列表属于同一类型。

Answer 6

public static boolean doCollectionsContainSameElements(
        Collection<Integer> c1, Collection<Integer> c2){

    if (c1 == null || c2 == null) {
        return false;
    }
    else if (c1.size() != c2.size()) {
        return false;
    } else {    
        return c1.containsAll(c2) && c2.containsAll(c1);
    }       
}

Answer 7

对于一个通常不值得将数组转换为HashMap / set的集合。实际上，你可能最好将它们保存在一个数组中，然后按键对它们进行排序，并同时迭代这两个列表进行比较。

Answer 8

为了比较列表或集合，我们可以使用Arrays.equals(object[], object[])。它只会检查值。要获得Object[]，我们可以使用Collection.toArray()方法。

如何最好地比较Java中的两个集合并采取行动？

8 个答案: