对40,000个随机数进行排序：

Question

我正在为java寻找一个好的排序列表。谷歌搜索给我一些关于使用TreeSet / TreeMap的提示。但是这些组件缺少一件事：随机访问集合中的元素。例如，我想访问有序集合中的第n个元素，但是使用TreeSet，我必须遍历其他n-1个元素才能到达那里。这将是一种浪费，因为我的Set中有数千个元素。

基本上，我正在寻找类似于.NET中的排序列表的东西，能够快速添加元素，快速删除元素，并且可以随机访问列表中的任何元素。

这种排序列表是否在某处实现？感谢。

被修改

我对SortedList的兴趣源于这些问题：我需要维护一个包含数千个对象的列表（并且可以增长到数十万个）。这些对象将持久保存到数据库中。我想从整个列表中随机选择几十个元素。因此，我尝试维护一个分离的内存列表，其中包含所有对象的主键（长号）。当从数据库添加/删除对象时，我需要从列表中添加/删除键。我现在正在使用ArrayList，但是当记录数量增长时，我担心ArrayList不适合它。（想象一下，每次从数据库中删除对象时，都必须迭代数十万个元素）。回到我编写.NET编程的时候，我会使用一个排序的List（List是一个.NET类，一旦Sorted属性设置为true，将维护其元素的顺序，并提供帮助删除/插入元素的二进制搜索很快）。我希望我能从java BCL找到类似的东西，但不幸的是，我没有找到一个很好的匹配。

Answer 1

您似乎希望列表结构具有非常快速的删除和随机访问按索引（而非按键）次。 ArrayList为您提供后者，HashMap或TreeMap为您提供前者。

Apache Commons Collections中有一个结构可能正是您要找的TreeList。 JavaDoc指定它已针对列表中的任何索引进行快速插入和删除进行了优化。如果您还需要泛型，这对您没有帮助。

Answer 2

这是我正在使用的SortedList实现。也许这有助于解决您的问题：

import java.util.Collection;
import java.util.Collections;
import java.util.Comparator;
import java.util.LinkedList;
/**
 * This class is a List implementation which sorts the elements using the
 * comparator specified when constructing a new instance.
 * 
 * @param <T>
 */
public class SortedList<T> extends ArrayList<T> {
    /**
     * Needed for serialization.
     */
    private static final long serialVersionUID = 1L;
    /**
     * Comparator used to sort the list.
     */
    private Comparator<? super T> comparator = null;
    /**
     * Construct a new instance with the list elements sorted in their
     * {@link java.lang.Comparable} natural ordering.
     */
    public SortedList() {
    }
    /**
     * Construct a new instance using the given comparator.
     * 
     * @param comparator
     */
    public SortedList(Comparator<? super T> comparator) {
        this.comparator = comparator;
    }
    /**
     * Construct a new instance containing the elements of the specified
     * collection with the list elements sorted in their
     * {@link java.lang.Comparable} natural ordering.
     * 
     * @param collection
     */
    public SortedList(Collection<? extends T> collection) {
        addAll(collection);
    }
    /**
     * Construct a new instance containing the elements of the specified
     * collection with the list elements sorted using the given comparator.
     * 
     * @param collection
     * @param comparator
     */
    public SortedList(Collection<? extends T> collection, Comparator<? super T> comparator) {
        this(comparator);
        addAll(collection);
    }
    /**
     * Add a new entry to the list. The insertion point is calculated using the
     * comparator.
     * 
     * @param paramT
     * @return <code>true</code> if this collection changed as a result of the call.
     */
    @Override
    public boolean add(T paramT) {
        int initialSize = this.size();
        // Retrieves the position of an existing, equal element or the 
        // insertion position for new elements (negative).
        int insertionPoint = Collections.binarySearch(this, paramT, comparator);
        super.add((insertionPoint > -1) ? insertionPoint : (-insertionPoint) - 1, paramT);
        return (this.size() != initialSize);
    }
    /**
     * Adds all elements in the specified collection to the list. Each element
     * will be inserted at the correct position to keep the list sorted.
     * 
     * @param paramCollection
     * @return <code>true</code> if this collection changed as a result of the call.
     */
    @Override
    public boolean addAll(Collection<? extends T> paramCollection) {
        boolean result = false;
        if (paramCollection.size() > 4) {
            result = super.addAll(paramCollection);
            Collections.sort(this, comparator);
        }
        else {
            for (T paramT:paramCollection) {
                result |= add(paramT);
            }
        }
        return result;
    }
    /**
     * Check, if this list contains the given Element. This is faster than the
     * {@link #contains(Object)} method, since it is based on binary search.
     * 
     * @param paramT
     * @return <code>true</code>, if the element is contained in this list;
     * <code>false</code>, otherwise.
     */
    public boolean containsElement(T paramT) {
        return (Collections.binarySearch(this, paramT, comparator) > -1);
    }
    /**
     * @return The comparator used for sorting this list.
     */
    public Comparator<? super T> getComparator() {
        return comparator;
    }
    /**
     * Assign a new comparator and sort the list using this new comparator.
     * 
     * @param comparator
     */
    public void setComparator(Comparator<? super T> comparator) {
        this.comparator = comparator;
        Collections.sort(this, comparator);
    }
}

此解决方案非常灵活，使用现有的Java函数：

完全基于泛型
使用java.util.Collections查找和插入列表元素
使用自定义Comparator进行列表排序的选项

一些注意事项：

此排序列表未同步，因为它继承自java.util.ArrayList。如果需要，请使用Collections.synchronizedList（有关详细信息，请参阅java.util.ArrayList的Java文档。）
初始解决方案基于java.util.LinkedList。为了获得更好的性能，特别是找到插入点（Logan的评论）和更快的获取操作（https://dzone.com/articles/arraylist-vs-linkedlist-vs），这已经更改为java.util.ArrayList。

Answer 3

PHUONG：

对40,000个随机数进行排序：

0.022秒

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Random;


public class test
{
    public static void main(String[] args)
    {
        List<Integer> nums = new ArrayList<Integer>();
        Random rand = new Random();
        for( int i = 0; i < 40000; i++ )
        {
            nums.add( rand.nextInt(Integer.MAX_VALUE) );
        }

        long start = System.nanoTime();
        Collections.sort(nums);
        long end = System.nanoTime();

        System.out.println((end-start)/1e9);
    }
}

由于您很少需要排序，根据您的问题陈述，这可能更多高效。

Answer 4

根据您使用列表的方式，使用TreeSet然后在最后使用toArray（）方法可能是值得的。我有一个需要排序列表的情况，我发现TreeSet + toArray（）比添加到数组并在最后合并排序要快得多。

Answer 5

来自Java Happy Libraries的

SortedList装饰器可用于从Apache Collections中装饰TreeList。这将生成一个新的列表，其性能可与TreeSet进行比较。 https://sourceforge.net/p/happy-guys/wiki/Sorted%20List/

Answer 6

GlazedLists有一个非常非常好的排序列表实现

Answer 7

使用HashMap怎么样？插入，删除和检索都是O（1）操作。如果要对所有内容进行排序，可以在Map中获取值列表，并通过O（n log n）排序算法运行它们。

修改

快速搜索找到了LinkedHashMap，它维护了您的密钥的插入顺序。这不是一个确切的解决方案，但它非常接近。

Answer 8

一般情况下，您不能有恒定的时间查找和记录时间删除/插入，但如果您对日志时间查找感到满意，那么您可以使用SortedList。

不确定您是否相信我的编码，但我最近在Java中编写了一个SortedList实现，您可以从http://www.scottlogic.co.uk/2010/12/sorted_lists_in_java/下载。此实现允许您在日志时间中查找列表的第i个元素。

Answer 9

为了测试康拉德霍尔的早期芒果的效率，我做了一个快速的比较，我认为这是缓慢的做法：

package util.collections;

import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.Iterator;
import java.util.List;
import java.util.ListIterator;

/**
 *
 * @author Earl Bosch
 * @param <E> Comparable Element
 *
 */
public class SortedList<E extends Comparable> implements List<E> {

    /**
     * The list of elements
     */
    private final List<E> list = new ArrayList();

    public E first() {
        return list.get(0);
    }

    public E last() {
        return list.get(list.size() - 1);
    }

    public E mid() {
        return list.get(list.size() >>> 1);
    }

    @Override
    public void clear() {
        list.clear();
    }

    @Override
    public boolean add(E e) {
        list.add(e);
        Collections.sort(list);
        return true;
    }

    @Override
    public int size() {
        return list.size();
    }

    @Override
    public boolean isEmpty() {
        return list.isEmpty();
    }

    @Override
    public boolean contains(Object obj) {
        return list.contains((E) obj);
    }

    @Override
    public Iterator<E> iterator() {
        return list.iterator();
    }

    @Override
    public Object[] toArray() {
        return list.toArray();
    }

    @Override
    public <T> T[] toArray(T[] arg0) {
        return list.toArray(arg0);
    }

    @Override
    public boolean remove(Object obj) {
        return list.remove((E) obj);
    }

    @Override
    public boolean containsAll(Collection<?> c) {
        return list.containsAll(c);
    }

    @Override
    public boolean addAll(Collection<? extends E> c) {

        list.addAll(c);
        Collections.sort(list);
        return true;
    }

    @Override
    public boolean addAll(int index, Collection<? extends E> c) {
        throw new UnsupportedOperationException("Not supported.");
    }

    @Override
    public boolean removeAll(Collection<?> c) {
        return list.removeAll(c);
    }

    @Override
    public boolean retainAll(Collection<?> c) {
        return list.retainAll(c);
    }

    @Override
    public E get(int index) {
        return list.get(index);
    }

    @Override
    public E set(int index, E element) {
        throw new UnsupportedOperationException("Not supported.");
    }

    @Override
    public void add(int index, E element) {
        throw new UnsupportedOperationException("Not supported.");
    }

    @Override
    public E remove(int index) {
        return list.remove(index);
    }

    @Override
    public int indexOf(Object obj) {
        return list.indexOf((E) obj);
    }

    @Override
    public int lastIndexOf(Object obj) {
        return list.lastIndexOf((E) obj);
    }

    @Override
    public ListIterator<E> listIterator() {
        return list.listIterator();
    }

    @Override
    public ListIterator<E> listIterator(int index) {
        return list.listIterator(index);
    }

    @Override
    public List<E> subList(int fromIndex, int toIndex) {
        throw new UnsupportedOperationException("Not supported.");
    }

}

原来它的速度快了两倍！我认为这是因为SortedLinkList缓慢获取 - 这使得它不适合列表。

相同随机列表的比较时间：

SortedLinkList：15731.460
SortedList：6895.494
ca.odell.glazedlists.SortedList：712.460
org.apache.commons.collections4.TreeList：3226.546

似乎是glazedlists.SortedList非常快......

Answer 10

您不需要排序列表。您根本不需要排序。

从数据库中添加/删除对象时，我需要从列表中添加/删除键。

但不能立即执行，删除操作可以等待。使用ArrayList，其中包含ID的所有活动对象以及已删除对象的一定百分比。使用单独的HashSet来跟踪已删除的对象。

private List<ID> mostlyAliveIds = new ArrayList<>();
private Set<ID> deletedIds = new HashSet<>();

我想从整个列表中随机选择几十个元素。

ID selectOne(Random random) {
    checkState(deletedIds.size() < mostlyAliveIds.size());
    while (true) {
        int index = random.nextInt(mostlyAliveIds.size());
        ID id = mostlyAliveIds.get(index);
        if (!deletedIds.contains(ID)) return ID;
    }
}

Set<ID> selectSome(Random random, int count) {
    checkArgument(deletedIds.size() <= mostlyAliveIds.size() - count);
    Set<ID> result = new HashSet<>();
    while (result.size() < count) result.add(selectOne(random));
}

要维护数据，请执行类似的操作

void insert(ID id) {
    if (!deletedIds.remove(id)) mostlyAliveIds.add(ID);
} 

void delete(ID id) {
    if (!deletedIds.add(id)) {
         throw new ImpossibleException("Deleting a deleted element);
    }
    if (deletedIds.size() > 0.1 * mostlyAliveIds.size()) {
        mostlyAliveIds.removeAll(deletedIds);
        deletedIds.clear();
    }
}

唯一棘手的部分是insert，它必须检查是否已恢复已删除的ID。

delete确保mostlyAliveIds中不超过10％的元素被删除。发生这种情况时，他们将全部清除（我没有检查JDK源，但我希望他们做对了），然后继续进行演示。

在不超过10％的无效ID的情况下，selectOne的开销平均不超过10％。

我很确定它的速度比任何排序都要快，因为摊销的复杂度为O(n)。

一个很好的Java排序列表

10 个答案:

对40,000个随机数进行排序：

0.022秒