我找到了一种更有效的排序算法,其平均和最佳性能为O(N),并且性能最差的是O(N Log(N))。在均匀分布的数据。
我需要你的帮助告诉我,如果我的测试是正确的,我最大的问题是:我如何在现实世界数据上进行测试?
这个问题将分为五个部分:
Java在merge sort实现中使用modefied Collections.sort算法。从jdk 7开始replaced与timsort。在我的测试中,我一直在研究jdk 6.与Android中使用的相同。
我发现了一种有趣的排序方法。我使用统计排序。或者更精确的线性统计排序。我假设所有变量都有“好”Linear regression。所以我根据它的值来计算变量的近似索引。如果多个变量具有相同的索引,我将其放在缓冲区数组中。我使用Collections.sort()对缓冲区进行排序。这个想法是缓冲区非常小,所以排序它将是~O(1)。这是O(N)和O(N Log(N))的性能之间的差异,在最坏的情况下,它的大小是N.之后,我在我的近似排序数组和缓冲区之间合并。结果是排序数组。
public class StatisticSort {
private static long minemum;
private static long sum;
public static void sort(List<Integer> source) {
findMinMaxAndSum(source);
int size = source.size();
ArrayList<Integer> buffer = new ArrayList<Integer>();
Vector<Integer> sourceVector = new Vector<Integer>(size);
sourceVector.setSize(size);
for (int i = 0; i < size; i++) {
Integer ai = source.get(i);
int index = calculateIndex(ai, source);
if (index != i && sourceVector.get(index) == null) {
sourceVector.set(index, ai);
}
else {
buffer.add(ai); // value
}
}
Collections.sort(buffer);
int bufferSize = buffer.size();
for (int i = 0, j = 0, counter = 0; i < size || j < bufferSize;) {
if (i < size && j < bufferSize) {
Integer ai = sourceVector.get(i);
while (ai == null && i < size) {
i++;
if (i < size) {
ai = sourceVector.get(i);
}
}
if (i == size) {
continue;
}
Integer aj = buffer.get(j);
if (aj < ai) {
source.set(counter, aj);
j++;
}
else {
source.set(counter, ai);
i++;
}
counter++;
}
else {
if (i < size) {
Integer ai = sourceVector.get(i);
if (ai != null) {
source.set(counter, ai);
counter++;
}
i++;
}
else if (j < bufferSize) {
Integer aj = buffer.get(j);
source.set(counter, aj);
j++;
counter++;
}
}
}
}
private static int calculateIndex(Integer ai, List<Integer> source) {
int size = source.size();
return Math.min(size - 1, (int) (((ai - minemum) * size * (size - 1)) / (2 * (sum - size * minemum))));
}
private static void findMinMaxAndSum(List<Integer> source) {
long minemum = Long.MAX_VALUE;
long maximum = -Long.MAX_VALUE;
long sum = 0;
for (int value : source) {
sum += value;
if (value < minemum) {
minemum = value;
}
if (value > maximum) {
maximum = value;
}
}
StatisticSort.minemum = minemum;
StatisticSort.sum = sum;
}
}
public abstract class Test {
protected ArrayList<ArrayList<Integer>> buffer;
private final Random random = new Random();
public int numberOfTests = 100;
public int maxValue = 1000;
public int numberOfItems = 100;
protected void createBuffer() {
buffer = new ArrayList<ArrayList<Integer>>();
for (int i = 0; i < numberOfTests; i++) {
ArrayList<Integer> list = new ArrayList<Integer>();
addRandomNumbers(list);
buffer.add(list);
}
}
protected void createBuffer(int...parametes) {
buffer = new ArrayList<ArrayList<Integer>>();
ArrayList<Integer> list = new ArrayList<Integer>();
for(int i = 0; i < parametes.length; i++){
list.add(parametes[i]);
}
buffer.add(list);
}
protected void addRandomNumbers(ArrayList<Integer> list) {
for (int i = 0; i < numberOfItems; i++) {
int value = random.nextInt(maxValue);
list.add(value);
}
}
protected ArrayList<ArrayList<Integer>> cloneBuffer() {
ArrayList<ArrayList<Integer>> clonedBuffer = new ArrayList<ArrayList<Integer>>();
for(int i = 0; i < buffer.size(); i++){
ArrayList<Integer> clonedList = new ArrayList<Integer>();
ArrayList<Integer> list = buffer.get(i);
for(int element : list){
clonedList.add(element);
}
clonedBuffer.add(clonedList);
}
return clonedBuffer;
}
public abstract void test();
}
性能测试
public class TestPerformance extends Test{
private final Timer timer = new Timer();
public void test() {
createBuffer();
timer.reset();
testSystem();
timeResoult("System");
timer.reset();
testMy();
timeResoult("My List");
}
public void test(int numberOfTests) {
long myTotalTime = 0;
long systemTotalTime = 0;
for(int i = 0; i < numberOfTests; i++){
createBuffer();
timer.reset();
testSystem();
long systemTime = timeResoult();
systemTotalTime += systemTime;
timer.reset();
testMy();
long myTime = timeResoult();
myTotalTime += myTime;
System.out.println("My Time / System Time = " + myTime + " / " + systemTime + " = \t" + ((double) myTime / systemTime));
}
System.out.println("My Time / System Time = " + ((double) myTotalTime / systemTotalTime));
}
private long timeResoult() {
return timeResoult(null);
}
private long timeResoult(String source) {
long time = timer.check();
if (source != null) {
System.out.println(source + ">\tTime: " + time);
}
return time;
}
private void testMy() {
ArrayList<ArrayList<Integer>> buffer = cloneBuffer();
for (int i = 0; i < numberOfTests; i++) {
ArrayList<Integer> list = buffer.get(i);
StatisticSort.sort(list);
}
}
private void testSystem() {
ArrayList<ArrayList<Integer>> buffer = cloneBuffer();
for (int i = 0; i < numberOfTests; i++) {
ArrayList<Integer> list = buffer.get(i);
Collections.sort(list);
}
}
}
主要
public static void main(String[] args) {
TestPerformance testBasics = new TestPerformance();
testBasics.numberOfTests = 1000;
testBasics.numberOfItems = 1000;
testBasics.maxValue = 1000000;
testBasics.test(1000);
}
答案 0 :(得分:0)
将您的排序与Collections.sort
的算法进行比较存在一个主要问题:
Collections.sort
是必要的纯comparison sort,它只是一种方法来判断哪两个对象更小,所有比较排序都可以证明是最差的O(n log n)情况下
你的排序不适用于任意对象,因为你需要有一个O(1)估计值,其中值将在最终数组中