Question

我有一个文本文件，用Id和someText读取每个行并将其转换为对象。我想将它们分组，以便我有两个列表：唯一列表和重复列表。数据非常大，达到数十万行。哪个是最好的数据结构？请在C＃中提供一些示例代码。非常感谢！

例如：

从文本文件中读取的原始列表：

{(1, someText),(2, someText),(3, someText),(3, someText1),(4, someText)}

唯一列表：

{(1, someText),(2, someText),(4, someText)}

重复列表：

{(3, someText),(3, someText1)}

Answer 1

以下是LinQ

的示例

    Random rnd = new Random();
        StreamReader sr = new StreamReader("enterYourPathHere");
        string line = "";
        int cnt = 0; //This will "generate our ids".
        List<KeyValuePair<int,string>> values = new List<KeyValuePair<int, string>>();

        while ((line = sr.ReadLine()) != null)
        {
            //You convert the line to your object (using keyvaluepair for testing)
            var obj = new KeyValuePair<int, string>(cnt, line);
            values.Add(obj);
            //Increment the id on with 50% chances
            if (rnd.Next(0,1) >0.5) cnt++;

        }

        var unique = values.GroupBy(x=>x.Key).Distinct().Select(x=>x).ToList();
        var duplicates = values.GroupBy(x => x.Key).Where(x => x.Count() > 1).Select(x => x).ToList();

如何将所有重复对象分组到一个列表，将所有唯一对象分组到C＃中的原始列表中的另一个列表？

1 个答案: