我有这段代码生成HashSet
并在其上调用removeAll()
。我创建了一个类A
,它只是一个int的包装器,它记录调用equals
的次数 - 程序输出该数字。
import java.util.*;
class A {
int x;
static int equalsCalls;
A(int x) {
this.x = x;
}
@Override
public int hashCode() {
return x;
}
@Override
public boolean equals(Object o) {
equalsCalls++;
if (!(o instanceof A)) return false;
return x == ((A)o).x;
}
public static void main(String[] args) {
int setSize = Integer.parseInt(args[0]);
int listSize = Integer.parseInt(args[1]);
Set<A> s = new HashSet<A>();
for (int i = 0; i < setSize; i ++)
s.add(new A(i));
List<A> l = new LinkedList<A>();
for (int i = 0; i < listSize; i++)
l.add(new A(i));
s.removeAll(l);
System.out.println(A.equalsCalls);
}
}
事实证明,removeAll
操作与我预期的不是线性的:
$ java A 10 10
55
$ java A 100 100
5050
$ java A 1000 1000
500500
事实上,它似乎是二次方的。为什么呢?
用s.removeAll(l);
替换行for (A b : l) s.remove(b);
会修复它以我期望的方式行事:
$ java A 10 10
10
$ java A 100 100
100
$ java A 1000 1000
1000
为什么?
这是显示二次曲线的图表:
x和y轴是Java程序的两个参数。 z轴是A.equals
次呼叫的数量。
该图表是由此Asymptote程序生成的:
import graph3;
import grid3;
import palette;
currentprojection=orthographic(0.8,1,1);
size(400,300,IgnoreAspect);
defaultrender.merge=true;
real[][] a = new real[][]{
new real[]{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},
new real[]{0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1},
new real[]{0,1,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3},
new real[]{0,1,2,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6},
new real[]{0,1,2,3,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10},
new real[]{0,1,2,3,4,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15},
new real[]{0,1,2,3,4,5,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21},
new real[]{0,1,2,3,4,5,6,28,28,28,28,28,28,28,28,28,28,28,28,28,28},
new real[]{0,1,2,3,4,5,6,7,36,36,36,36,36,36,36,36,36,36,36,36,36},
new real[]{0,1,2,3,4,5,6,7,8,45,45,45,45,45,45,45,45,45,45,45,45},
new real[]{0,1,2,3,4,5,6,7,8,9,55,55,55,55,55,55,55,55,55,55,55},
new real[]{0,1,2,3,4,5,6,7,8,9,10,66,66,66,66,66,66,66,66,66,66},
new real[]{0,1,2,3,4,5,6,7,8,9,10,11,78,78,78,78,78,78,78,78,78},
new real[]{0,1,2,3,4,5,6,7,8,9,10,11,12,91,91,91,91,91,91,91,91},
new real[]{0,1,2,3,4,5,6,7,8,9,10,11,12,13,105,105,105,105,105,105,105},
new real[]{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,120,120,120,120,120,120},
new real[]{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,136,136,136,136,136},
new real[]{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,153,153,153,153},
new real[]{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,171,171,171},
new real[]{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,190,190},
new real[]{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,210},
};
surface s=surface(a,(-1/2,-1/2),(1/2,1/2),Spline);
draw(s,mean(palette(s.map(zpart),Rainbow())),black);
//grid3(XYZgrid);
数组a
是由这个Haskell程序生成的:
import System.Process
import Data.List
f :: Integer -> Integer -> IO Integer
f x y = fmap read $ readProcess "/usr/bin/java" ["A", show x, show y] ""
g :: [[Integer]] -> [String]
g xss = map (\xs -> "new real[]{" ++ intercalate "," (map show xs) ++ "},") xss
main = do
let n = 20
xs <- mapM (\x -> mapM (\y -> f x y) [0..n]) [0..n]
putStrLn $ unlines $ g xs
答案 0 :(得分:11)
removeAll
工作所需的时间取决于您传递的收集类型。当您查看removeAll
的实现时:
public boolean removeAll(Collection<?> c) {
boolean modified = false;
if (size() > c.size()) {
for (Iterator<?> i = c.iterator(); i.hasNext(); )
modified |= remove(i.next());
} else {
for (Iterator<?> i = iterator(); i.hasNext(); ) {
if (c.contains(i.next())) {
i.remove();
modified = true;
}
}
}
return modified;
}
可以看到,当HashSet
和集合具有相同的大小时,它会迭代HashSet
并为每个元素调用c.contains
。由于您使用LinkedList
作为参数,因此这是每个元素的O(n)操作。因为它需要为被移除的n个元素中的每个元素执行此操作,所以结果是O(n 2 )操作。
如果您将LinkedList
替换为提供contains
更高效实施的集合,那么removeAll
的效果应相应提高。如果你使HashSet
大于集合,那么性能也应该大幅提升,因为它会迭代集合。