Question

在我的测试自动化中，我无法访问XML或数据库。我想找到网格中特定列的重复记录。我的网格有20,000条记录。唯一的问题是我们无权访问任何数据库，因此如果我更改页面速度很慢，每页加载50条记录。存在20,000条记录的性能问题。

Answer 1

创建一个HashMap<Integer, ArrayList<YourObject>> - 每次你通过对象获得相同的对象时，将它放在地图中并将其添加到ArrayList

Answer 2

生成此结果后，您将对其进行缓存，以便不必在每次页面访问时重新生成它。但是在2毫秒时，你可能不会打扰。

以下是计时

的示例

static class MyRecord {
    String text;
    int id;
    double d;

    public MyRecord(String text, int id, double d) {
        this.text = text;
        this.id = id;
        this.d = d;
    }

    public int getId() {
        return id;
    }
}

public static void main(String[] args) {
    for (int t = 0; t < 100; t++) {
        long start = System.nanoTime();
        Random rand = new Random();
        Map<Integer, MyRecord> map = IntStream.range(0, 20000)
                .mapToObj(i -> new MyRecord("text-" + i, rand.nextInt(i+1), i))
                .collect(Collectors.groupingBy(MyRecord::getId, 
                        Collectors.reducing(null, (a, b) -> a == null ? b : a)));
        long time = System.nanoTime() - start;
        System.out.printf("Took %.1f ms to generate and collect duplicates%n", time/1e6);
    }
}

此测试需要2.0 ms来生成和整理重复记录。您可以在Java 7中编写相同的代码，编写它的时间会更长但是速度也不会慢......如果它不需要生成记录就会更快。

为了比较，我与

并列

Map<Integer, MyRecord> map = IntStream.range(0, 20000).parallel()
    .mapToObj(i -> new MyRecord("text-" + i, rand.nextInt(i+1), i))
    .collect(Collectors.groupingByConcurrent(MyRecord::getId,
            Collectors.reducing(null, (a, b) -> a == null ? b : a)));

但现在需要16毫秒。：P

Answer 3

这是一个基本选项。出于演示目的，我创建了一个包含20,000多条记录的列表，然后检查其中的重复项 - 结果为29毫秒。

基本上，我们的想法是扫描您的值，并且对于每个值，验证它是唯一的 - 如果是，则将其放在您比较的“唯一”桶中;否则 - 把它放在重复的桶里。

import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;


public class FindDuplicates {

    /**
     * @param args
     */
    public static void main(String[] args) {

        List<String> values = new ArrayList<String>();
        Set<String> unique = new HashSet<String>();
        Set<String> duplicates = new HashSet<String>();

        values.add("1");
        values.add("2");
        values.add("3");

        for(int i=0;i<=20000;i++)
        {
            values.add(Integer.toString(i));
        }

        values.add("1");
        values.add("2");
        values.add("4");

        long before = System.currentTimeMillis();

        for(String str : values)
        {
            if(unique.contains(str))
            {
                duplicates.add(str);
            }
            else
            {
                unique.add(str);
            }
        }

        long after = System.currentTimeMillis();

        System.out.println("Processing time: " + (after-before));

        System.out.println("total values: " + values.size());
        System.out.println("total unique: " + unique.size());
        System.out.println("total duplicates: " + duplicates.size());
    }

}

从具有20,000条记录的网格中查找重复记录，而无需快速访问数据库

3 个答案: