我有大约250,000条记录标记为Boss,每个Boss有2到10名员工。我每天都需要了解员工的详细信息。大约有1,000,000名员工。我使用Linq获取每日工作人员的唯一列表。请考虑以下C#LINQ和模型
void Main()
{
List<Boss> BossList = new List<Boss>()
{
new Boss()
{
EmpID = 101,
Name = "Harry",
Department = "Development",
Gender = "Male",
Employees = new List<Person>()
{
new Person() {EmpID = 102, Name = "Peter", Department = "Development",Gender = "Male"},
new Person() {EmpID = 103, Name = "Emma Watson", Department = "Development",Gender = "Female"},
}
},
new Boss()
{
EmpID = 104,
Name = "Raj",
Department = "Development",
Gender = "Male",
Employees = new List<Person>()
{
new Person() {EmpID = 105, Name = "Kaliya", Department = "Development",Gender = "Male"},
new Person() {EmpID = 103, Name = "Emma Watson", Department = "Development",Gender = "Female"},
}
},
..... ~ 250,000 Records ......
};
List<Person> staffList = BossList
.SelectMany(x =>
new[] { new Person { Name = x.Name, Department = x.Department, Gender = x.Gender, EmpID = x.EmpID } }
.Concat(x.Employees))
.GroupBy(x => x.EmpID) //Group by employee ID
.Select(g => g.First()) //And select a single instance for each unique employee
.ToList();
}
public class Person
{
public int EmpID { get; set; }
public string Name { get; set; }
public string Department { get; set; }
public string Gender { get; set; }
}
public class Boss
{
public int EmpID { get; set; }
public string Name { get; set; }
public string Department { get; set; }
public string Gender { get; set; }
public List<Person> Employees { get; set; }
}
在上面的LINQ中,我获得了不同员工或员工名单,该列表包含超过1,000,000条记录。从获得的列表中我需要搜索&#34; Raj&#34;
staffList.Where(m => m.Name.ToLowerInvariant().Contains("Raj".ToLowerInvariant()));
对于此操作,需要3到5分钟才能得到结果。
我怎样才能提高效率。请帮助我...
答案 0 :(得分:2)
如果您将Boss
更改为继承自Person
(public class Boss : Person
),则不仅 需要在Person
中复制您的属性, Boss
,您不必为每个Person
创建所有新的Boss
个实例,因为Boss
已经是Person
:
IEnumerable<Person> staff = BossList
.Concat(BossList
.SelectMany(x => x.Employees)
)
.DistinctBy(p => p.EmpId)
.ToList()
DistinctBy
定义为
public static IEnumerable<TSource> DistinctBy<TSource, TKey>
(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
var seenKeys = new HashSet<TKey>();
foreach (TSource element in source)
{
if (seenKeys.Add(keySelector(element)))
{
yield return element;
}
}
}
此外,在您的比较中,您将每个Name
转换为小写并进行比较 - 这是您不需要的大量字符串创建。相反,尝试像
staffList.Where(m => m.Name.Equals("Raj", StringComparison.InvariantCultureIgnoreCase));
此外,请注意,您对Contains
的使用也会匹配Rajamussen
和mirajii
等名称 - 可能不是您所期望的。
答案 1 :(得分:0)
将staffList更改为字典是否可行?一个比Dictionary和SortedList更好的搜索算法可以让你获得最大的改进。
我已经测试了下面的代码,它只需几秒钟即可运行。
private static void Main()
{
List<Boss> BossList = new List<Boss>();
var b1 = new Boss()
{
EmpID = 101,
Name = "Harry",
Department = "Development",
Gender = "Male",
Employees = new List<Person>()
{
new Person() {EmpID = 102, Name = "Peter", Department = "Development", Gender = "Male"},
new Person() {EmpID = 103, Name = "Emma Watson", Department = "Development", Gender = "Female"},
}
};
var b2 = new Boss()
{
EmpID = 104,
Name = "Raj",
Department = "Development",
Gender = "Male",
Employees = new List<Person>()
{
new Person() {EmpID = 105, Name = "Kaliya", Department = "Development", Gender = "Male"},
new Person() {EmpID = 103, Name = "Emma Watson", Department = "Development", Gender = "Female"},
}
};
Random r = new Random();
var genders = new [] {"Male", "Female"};
for (int i = 0; i < 1500000; i++)
{
b1.Employees.Add(new Person { Name = "Name" + i, Department = "Department" + i, Gender = genders[r.Next(0, 1)], EmpID = 200 + i });
b2.Employees.Add(new Person { Name = "Nam" + i, Department = "Department" + i, Gender = genders[r.Next(0, 1)], EmpID = 1000201 + i });
}
BossList.Add(b1);
BossList.Add(b2);
Stopwatch sw = new Stopwatch();
sw.Start();
var emps = BossList
.SelectMany(x =>
new[] {new Person {Name = x.Name, Department = x.Department, Gender = x.Gender, EmpID = x.EmpID}}
.Concat(x.Employees))
.GroupBy(x => x.EmpID) //Group by employee ID
.Select(g => g.First());
var staffList = emps.ToList();
var staffDict = emps.ToDictionary(p => p.Name.ToLowerInvariant() + p.EmpID);
var staffSortedList = new SortedList<string, Person>(staffDict);
Console.WriteLine("Time to load staffList = " + sw.ElapsedMilliseconds + "ms");
var rajKeyText = "Raj".ToLowerInvariant();
sw.Reset();
sw.Start();
var rajs1 = staffList.AsParallel().Where(p => p.Name.ToLowerInvariant().Contains(rajKeyText)).ToList();
Console.WriteLine("Time to find Raj = " + sw.ElapsedMilliseconds + "ms");
sw.Reset();
sw.Start();
var rajs2 = staffDict.AsParallel().Where(kvp => kvp.Key.Contains(rajKeyText)).ToList();
Console.WriteLine("Time to find Raj = " + sw.ElapsedMilliseconds + "ms");
sw.Reset();
sw.Start();
var rajs3 = staffSortedList.AsParallel().Where(kvp => kvp.Key.Contains(rajKeyText)).ToList();
Console.WriteLine("Time to find Raj = " + sw.ElapsedMilliseconds + "ms");
Console.ReadLine();
}
public class Person
{
public int EmpID { get; set; }
public string Name { get; set; }
public string Department { get; set; }
public string Gender { get; set; }
}
public class Boss
{
public int EmpID { get; set; }
public string Name { get; set; }
public string Department { get; set; }
public string Gender { get; set; }
public List<Person> Employees { get; set; }
}
}
输出1:
Output2(在搜索时使用.AsParallel()):
换句话说,如果您无法使用更快的数据结构,那么只需更改表单即可提高搜索速度
staffList.Where(m => m.Name.ToLowerInvariant().Contains("Raj".ToLowerInvariant()));
到
staffList.AsParallel().Where(m => m.Name.ToLowerInvariant().Contains("Raj".ToLowerInvariant()));