这是我的班级:
public class Record
{
public string PersonName {get; set;}
public string RequestID {get; set;}
}
我有一个与此类相关的db表而且我在开始时将所有内容都拉到了内存中。我正在尝试使用以下算法找到两个人之间的关系:
RequestID
s RequestID
RequestID
。以下是我对上述算法的实现:
foreach(var elem in listofFirstPerson)
{
List<Record> listofRelatedPeople = RecordList.Where(r => r. RequestID == elem.RequestID).ToList(); //I actually get distinct records from actual list and the distinct version count is about 100k
foreach(var relatedPerson in listofRelatedPeople )
{
List<Record> listofRecordsforRelatedPerson = RecordList.Where(r => r. PersonName == relatedPerson.PersonName).ToList();
for(int i = 0; i < listofRecordsforRelatedPerson.Count; i++)
{
for(int j = 0; j < listofSecondPerson.Count; j++)
{
if(listofRecordsforRelatedPerson[i].RequestID ==listofSecondPerson[j].RequestID)
//break all loops and do stuff
}
}
}
}
此算法有效。但它非常缓慢。正如我所提到的,listofRelatedPeople
约为100k,并且它在大约20秒内仅迭代几百条记录。如何让这个算法更快?有更快的方法吗?提前谢谢。
编辑:
在我的列表中有这样的记录:
假设我选择 Jason 和 Kevin ,因为您看到他们的请求ID 不一样,所以我需要找到他们之间的关系。因此,我列出的用户具有相同的 RequestID ,并且 Larry 和 Tom 。然后我用 Larry 获得所有记录,我发现他没有与 Kevin 相同的 RequestID 的记录。因此我去 Tom ,我看到 Tom 与 Kevin 具有相同的 RequestID ,所以我选择汤姆,它已经完成。
答案 0 :(得分:2)
我理解它的方式,你当前的算法可以用LINQ表示如下:
static Record FirstRelated(List<Record> records, string firstName, string secondName)
{
var listofFirstPerson = records.Where(r => r.PersonName == firstName).ToList();
var listofSecondPerson = records.Where(r => r.PersonName == secondName).ToList();
var result = (
from r1 in listofFirstPerson // (1)
from r2 in records //(2)
where r2.RequestID == r1.RequestID
from r3 in records // (3)
where r3.PersonName == r2.PersonName
from r4 in listofSecondPerson // (4)
where r4.RequestID == r3.RequestID
select r2
).FirstOrDefault();
return result;
}
所以基本上你有4个嵌套循环。如果我们指定
N = records.Count
M1 = listofFirstPerson.Count
M2 = listofSecondPerson.Count
那么算法的时间复杂度将为O(M1 * N * N * M2),其中大N
正常会导致性能问题。
观察上述实现,可以注意到,通过将(1)与(2),(3)与(4)合并并将得到的集合与PersonName
相关联,可以实现相同的结果: / p>
var firstRelated =
from r1 in listofFirstPerson
from r2 in records
where r2.RequestID == r1.RequestID
select r2;
var secondRelated =
from r4 in listofSecondPerson
from r3 in records
where r3.RequestID == r4.RequestID
select r3;
var result = (
from r1 in firstRelated
from r2 in secondRelated
where r2.PersonName == r1.PersonName
select r1
).FirstOrDefault();
到目前为止,我们还没有改进任何东西 - 算法仍然是相同的二次时间复杂度。但它给了我们这个想法 - 因为现在firstRelated
和secondRelated
是独立的,所以不需要为{{1}的每个记录执行secondRelated
相反,我们可以提前从firstRelated
准备一个快速哈希查找数据结构(平均O(1)查找时间复杂度),并在迭代secondRelated
时使用它,导致很多更好的O(M1 * N)时间复杂度(几乎就像消除了代码中最后两个内部循环导致速度变慢的成本)。
另请注意,我们不再需要构建两个初始列表,因为我们只会处理firstRelated
和firstRelated
一次。
所以最终的解决方案是这样的:
secondRelated
现在要么使用LINQ var firstRelated =
from r1 in records
where r1.PersonName == firstName
from r2 in records
where r2.RequestID == r1.RequestID
select r2;
var secondRelated =
from r4 in records
where r4.PersonName == secondName
from r3 in records
where r3.RequestID == r4.RequestID
select r3;
运算符为我们做有效的关联:
join
或通过从var result = (
from r1 in firstRelated
from r2 in secondRelated
where r2.PersonName == r1.PersonName
select r1
).FirstOrDefault();
准备并使用HashSet
PersonName
来手动执行:
secondRelated
答案 1 :(得分:1)
“。Where()”和“.ToList()”都是非常慢的操作。
您可以将“RecordList”映射到两个词典,其中“RequestID”为另一个“PersonName”。在forech之前做它。这应该运行得更快。
var dictionary1 = RecordList.GroupBy(f => f.RequestID).ToDictionary(f => f.Key, v => v.ToArray());
var dictionary2 = RecordList.GroupBy(f => f.PersonName).ToDictionary(f => f.Key, v => v.ToArray());
然后在foreach内部,您可以将它们用作
var listofRelatedPeople = dictionary1[elem.RequestID];
var listofRecordsforRelatedPerson= dictionary2[relatedPerson.PersonName];
当然,如果密钥不存在,最好使用dictionary1.TryGetValue()
<强>更新强>
如果您需要C#方式,其中一个解决方案可能是:
var recordList = new Record[]
{
new Record() {RequestID = "1", PersonName = "User1"},
new Record() {RequestID = "2", PersonName = "User1"},
new Record() {RequestID = "3", PersonName = "User2"},
new Record() {RequestID = "1", PersonName = "User2"},
new Record() {RequestID = "4", PersonName = "User3"},
new Record() {RequestID = "5", PersonName = "User3"},
new Record() {RequestID = "1", PersonName = "User4"},
new Record() {RequestID = "6", PersonName = "User4"},
new Record() {RequestID = "7", PersonName = "User5"},
new Record() {RequestID = "1", PersonName = "User5"},
};
var dictionary1 = recordList.GroupBy(f => f.RequestID).ToDictionary(f => f.Key, v => v.Select(z=>z.PersonName).ToArray());
var dictionary2 = recordList.GroupBy(f => f.PersonName).ToDictionary(f => f.Key, v => v.Select(z => z.RequestID).ToArray());
var rec1 = dictionary2["User1"]; //all requestsIds for User1
var rec2 = dictionary2["User2"]; //all requestsIds for User2
var ids = rec1.Intersect(rec2).Distinct(); //only request ids exists for both users in same time
foreach (var id in ids)
{
var users = dictionary1[id];
if (users.Length > 2)
break;
//users = User1, User2, User4, User5
}
更新2
SQL版本(MSSQL)这将比C#
快得多CREATE TABLE #tmp (ID varchar(max), Name varchar(max))
INSERT INTO #tmp (ID, Name)
SELECT '1', 'User1' UNION ALL
SELECT '2', 'User1' UNION ALL
SELECT '3', 'User2' UNION ALL
SELECT '1', 'User2' UNION ALL
SELECT '4', 'User3' UNION ALL
SELECT '5', 'User3' UNION ALL
SELECT '1', 'User4' UNION ALL
SELECT '6', 'User4'
SELECT C.Name
FROM #tmp A
INNER JOIN #tmp B ON A.ID = B.ID
INNER JOIN #tmp C ON A.ID = C.ID
WHERE A.Name = 'User1' and B.Name = 'User2' AND C.Name NOT IN ('User1', 'User2')
回复将是“User4”
答案 2 :(得分:1)
我认为你应该让数据库完成工作,它会更快。
查询看起来像这样:
join
我们请求第1个人的所有requestIds,看看它是否与第2个人的任何一个匹配。
答案 3 :(得分:1)
分组可以一次完成。这样做的好处不仅在于它更快,因为它是一次通过,但如果您正在对DB进行LINQ,那么它将由DB在服务器上执行,从而减少发送到客户端的数据量,通过使用索引等加快进程。
var source = new List<Record> { };
var grouped = source
.GroupBy(x => x.RequestID)
//Only groups with more than one entry
.Where(x => x.Count() > 1);
//Loop through the data like so
foreach(var group in grouped)
{
Console.WriteLine("Request: " + group.Key);
foreach(Record record in group)
Console.WriteLine(" " + record.PersonName);
}
如果您希望PersonName属性成为某种唯一标识符,以便您可以消除每个RequestID多次存在同一个人的情况,则可以执行此操作
var source = new List<Record> { };
var grouped = source
.GroupBy(x => x.RequestID)
//Select a key + only unique names
.Select(x => new { Key = x.Key, Data = x.Select(r => r.PersonName).Distinct()})
//Only groups with more than 1 entry
.Where(x => x.Data.Count() > 1);
//To loop through the data
foreach(var group in grouped)
{
Console.WriteLine("Group: " + group.Key);
foreach(var item in group.Data)
{
Console.WriteLine(" " + item.PersonName);
}
}
答案 4 :(得分:0)
List<Record> listofRecordsforRelatedPerson = RecordList.Where(r => r. PersonName
- &GT;将记录列表更改为relatedPeople并在您的两个for循环中使用:
for(int j = 0; j <= listofSecondPerson.Count; j++)
{...}
否则,例如,如果你的计数是25 anj = 25,它将什么都不做lol