如何以最快的降序对List <string>进行排序

时间:2019-02-20 20:31:36

标签: c# list performance sorting

我正在尝试找到一种对列表进行尽快排序的方法。我知道我将使用的“存储桶排序”(我认为吗?)。但是在那之前。我想我首先要寻找最快的清理算法,然后在存储桶排序中使用它?

字符串如下所示,我在一个循环中添加了100.000个元素:

-0,awb/aje - ddfas/asa - asoo/qwa
-1,awb/aje - ddfas/asa - asoo/qwa
-2,awb/aje - ddfas/asa - asoo/qwa

因此,我想对第一个以逗号分隔的参数按降序排序,该参数是双精度的,如-0,-1,-2等。

我尝试了3种方法,其中只有方法1实际上正确排序。方法2和3不能完全正确地按数字降序排序。

因此,正确排序的方法1在30秒内完成了此操作。 事实是,我将拥有大约300万个元素,而不仅仅是本示例中的100.000个元素,这将花费至少900秒或更长时间。

我的问题是,我们如何才能尽快对100.000个或更正确的300万个元素进行排序? 正在运行:sortingtestBENCHMARKS()将显示结果

    public void sortingtestBENCHMARKS()
    {
        List<String> dataLIST = new List<String>(); List<String> sortedLIST = new List<String>(); String resultString = ""; int index = 0;
        DateTime starttime = DateTime.Now; DateTime endtime = DateTime.Now; TimeSpan span = new TimeSpan();
        for (double i = 0; i < 100000; i+=1)
        {
            dataLIST.Add("-" + i + "," + "awb/aje" + " - " + "ddfas/asa" + " - " + "asoo/qwa");
        }
        dataLIST = shuffle(dataLIST);

        /*--------------------------------------------------------------------------*/
        //APPROACH 1: 30 seconds (Sorts correctly in descending order)
        starttime = DateTime.Now;
        dataLIST = sortLIST(dataLIST);
        endtime = DateTime.Now;
        span = endtime - starttime;
        resultString = "Approach 1: " + span.TotalSeconds;
        dataLIST = shuffle(dataLIST);

        /*--------------------------------------------------------------------------*/
        //APPROACH 2: 55 seconds (Sorts INcorrectly in descending order)
        starttime = DateTime.Now;
        for (int i = 0; i < dataLIST.Count; i++) 
        {
            index = sortedLIST.BinarySearch(dataLIST[i]);
            if (index < 0)
            {
                sortedLIST.Insert(~index, dataLIST[i]);
            }
        }
        endtime = DateTime.Now;
        span = endtime - starttime;
        resultString = resultString + "\nApproach 2: " + span.TotalSeconds;

        /*--------------------------------------------------------------------------*/
        //APPROACH 3: 2 seconds (Sorts INcorrectly in descending order)
        starttime = DateTime.Now;
        dataLIST.Sort(); //1.6 seconds
        endtime = DateTime.Now;
        span = endtime - starttime;
        resultString = resultString + "\nApproach 3: " + span.TotalSeconds;

        /*--------------------------------------------------------------------------*/

        MessageBox.Show("Elapsed Times:\n\n" + resultString);
    }
    List<String> sortLIST(List<String> theLIST)
    {
        System.Threading.Thread.CurrentThread.CurrentCulture = new System.Globalization.CultureInfo("en-US");
        theLIST.Sort(new Comparison<String>((a, b) =>
        {
            int result = 0;
            double ad = 0;
            double bd = 0;
            NumberFormatInfo provider = new NumberFormatInfo();
            provider.NumberGroupSeparator = ",";
            provider.NumberDecimalSeparator = ".";
            provider.NumberGroupSizes = new int[] { 5 };
            ad = Convert.ToDouble(a.Replace("a", "").Replace("c", "").Split(',')[0], provider);
            bd = Convert.ToDouble(b.Replace("a", "").Replace("c", "").Split(',')[0], provider);
            if (ad < bd)
            {
                result = 1;
            }
            else if (ad > bd)
            {
                result = -1;
            }
            return result;
        }));
        return theLIST;
    }
    List<String> shuffle(List<String> list)
    {
        var randomizedList = new List<String>();
        var rnd = new Random();
        while (list.Count != 0)
        {
            var index = rnd.Next(0, list.Count);
            randomizedList.Add(list[index]);
            list.RemoveAt(index);
        }
        return randomizedList;
    }

3 个答案:

答案 0 :(得分:1)

在我看来,您可以将字符串拆分为,字符,将-剥离掉拆分数组中的第一项,然后对结果使用OrderBy

var sorted = dataLIST.OrderBy(i => double.Parse(i.Split(',')[0].TrimStart('-'))).ToList();

我制作了您的代码的副本,然后使用了您拥有的一种工作方法,并将其与在上述拆分字符串方法上运行OrderBy进行了比较。 OrderBy / Split方法要快30倍以上。

public static void sortingtestBENCHMARKS()
{
    var dataLIST = new List<string>();

    // Create the list
    for (var i = 0; i < 100000; i ++)
    {
        dataLIST.Add("-" + i + "," + "awb/aje" + " - " + "ddfas/asa" + " - " + "asoo/qwa");
    }

    // Shuffle the list
    dataLIST = shuffle(dataLIST);

    // Make two copies of the same shuffled list
    var copy1 = dataLIST.ToList();
    var copy2 = dataLIST.ToList();

    // Use a stopwatch for measuring time when benchmark testing
    var stopwatch = new Stopwatch();

    /*--------------------------------------------------------------------------*/
    //APPROACH 1: 2.83 seconds (Sorts correctly in descending order)
    stopwatch.Start();
    copy2 = sortLIST(copy2);
    stopwatch.Stop();
    Console.WriteLine($"sortLIST method: {stopwatch.Elapsed.TotalSeconds} seconds");

    /*--------------------------------------------------------------------------*/
    //APPROACH 2: 0.09 seconds (Sorts correctly in descending order)
    stopwatch.Restart();
    copy1 = copy1.OrderBy(i => double.Parse(i.Split(',')[0].TrimStart('-'))).ToList();
    stopwatch.Stop();
    Console.WriteLine($"OrderBy method:  {stopwatch.Elapsed.TotalSeconds} seconds");

    // Ensure that the lists are sorted identically
    Console.WriteLine($"Lists are the same: {copy1.SequenceEqual(copy2)}");
}

输出

![enter image description here

答案 1 :(得分:0)

var sortedList = theLIST.OrderByDescending(s=>s);

答案 2 :(得分:0)

您应该尽可能地优化内部循环,因为大部分处理时间都花在了那里。我建议您自己实现字符串标记程序,因为您只需要第一个标记并且字符串非常统一。您可能要进行的第二个优化是将所有数字都乘以-1,因此以相反的顺序对列表进行排序很简单。像这样:

private static double getNumberFromString(String s){
    int posFirstComma=0;
    for (; posFirstComma<s.length() && s.charAt(posFirstComma)!=','; posFirstComma++);
    return Convert.toDouble(s.subString(0, posFirstComma)*(-1);
}
myData.sort(new Comparision<String>((a,b)=> getNumberFromString(a)-getNumberFromString(b));

我个人不会接触库本身中的排序算法,因为它已经过全面优化。只需优化for循环中的所有内容即可。