我有一个文本文件,用Id和someText读取每个行并将其转换为对象。我想将它们分组,以便我有两个列表:唯一列表和重复列表。数据非常大,达到数十万行。哪个是最好的数据结构?请在C#中提供一些示例代码。非常感谢!
例如:
从文本文件中读取的原始列表:
{(1, someText),(2, someText),(3, someText),(3, someText1),(4, someText)}
唯一列表:
{(1, someText),(2, someText),(4, someText)}
重复列表:
{(3, someText),(3, someText1)}
答案 0 :(得分:0)
以下是LinQ
的示例 Random rnd = new Random();
StreamReader sr = new StreamReader("enterYourPathHere");
string line = "";
int cnt = 0; //This will "generate our ids".
List<KeyValuePair<int,string>> values = new List<KeyValuePair<int, string>>();
while ((line = sr.ReadLine()) != null)
{
//You convert the line to your object (using keyvaluepair for testing)
var obj = new KeyValuePair<int, string>(cnt, line);
values.Add(obj);
//Increment the id on with 50% chances
if (rnd.Next(0,1) >0.5) cnt++;
}
var unique = values.GroupBy(x=>x.Key).Distinct().Select(x=>x).ToList();
var duplicates = values.GroupBy(x => x.Key).Where(x => x.Count() > 1).Select(x => x).ToList();