我在编写搜索功能的速度方面遇到了麻烦。功能步骤如下所述:
功能目标是能够跟踪链接为直接或具有多个分离度的表之间的链接。递归级别是固定的整数值。
我的问题是,每当我尝试在两个级别的搜索深度上运行此功能时(在此阶段不敢尝试更深),作业内存不足,或者我失去了耐心。我等了17分钟才把工作用完了。
每个表的平均列数为28,标准差为34。
这是一个图表,显示了可以在表之间建立的各种链接的示例:
这是我的代码:
private void FindLinkingTables(List<TableColumns> sourceList, TableSearchNode parentNode, string targetTable, int maxSearchDepth)
{
if (parentNode.Level < maxSearchDepth)
{
IEnumerable<string> tableColumns = sourceList.Where(x => x.Table.Equals(parentNode.Table)).Select(x => x.Column);
foreach (string sourceColumn in tableColumns)
{
string shortName = sourceColumn.Substring(1);
IEnumerable<TableSearchNode> tables = sourceList.Where(
x => x.Column.Substring(1).Equals(shortName) && !x.Table.Equals(parentNode.Table) && !parentNode.Ancenstory.Contains(x.Table)).Select(
x => new TableSearchNode { Table = x.Table, Column = x.Column, Level = parentNode.Level + 1 });
foreach (TableSearchNode table in tables)
{
parentNode.AddChildNode(sourceColumn, table);
if (!table.Table.Equals(targetTable))
{
FindLinkingTables(sourceList, table, targetTable, maxSearchDepth);
}
else
{
table.NotifySeachResult(true);
}
}
}
}
}
编辑分离出TableSearchNode逻辑并添加属性和方法以实现完整性
//TableSearchNode
public Dictionary<string, List<TableSearchNode>> Children { get; private set; }
//TableSearchNode
public List<string> Ancenstory
{
get
{
Stack<string> ancestory = new Stack<string>();
TableSearchNode ancestor = ParentNode;
while (ancestor != null)
{
ancestory.Push(ancestor.tbl);
ancestor = ancestor.ParentNode;
}
return ancestory.ToList();
}
}
//TableSearchNode
public void AddChildNode(string referenceColumn, TableSearchNode childNode)
{
childNode.ParentNode = this;
List<TableSearchNode> relatedTables = null;
Children.TryGetValue(referenceColumn, out relatedTables);
if (relatedTables == null)
{
relatedTables = new List<TableSearchNode>();
Children.Add(referenceColumn, relatedTables);
}
relatedTables.Add(childNode);
}
提前感谢您的帮助!
答案 0 :(得分:4)
你真的浪费了很多记忆。立刻想到的是:
首先用List<TableColumns> sourceList
替换传入的ILookup<string, TableColumns>
。您应该在致电FindLinkingTables
之前执行此操作:
ILookup<string, TableColumns> sourceLookup = sourceList.ToLookup(s => s.Table);
FindLinkingTables(sourceLookup, parentNode, targetTable, maxSearchDepth);
如果确实不需要,请不要致电.ToList()
。例如,如果您只想枚举结果列表的所有子项,则不需要它。所以你的主要功能将如下所示:
private void FindLinkingTables(ILookup<string, TableColumns> sourceLookup, TableSearchNode parentNode, string targetTable, int maxSearchDepth)
{
if (parentNode.Level < maxSearchDepth)
{
var tableColumns = sourceLookup[parentNode.Table].Select(x => x.Column);
foreach (string sourceColumn in tableColumns)
{
string shortName = sourceColumn.Substring(1);
var tables = sourceLookup
.Where(
group => !group.Key.Equals(parentNode.Table)
&& !parentNode.Ancenstory.Contains(group.Key))
.SelectMany(group => group)
.Where(tableColumn => tableColumn.Column.Substring(1).Equals(shortName))
.Select(
x => new TableSearchNode
{
Table = x.Table,
Column = x.Column,
Level = parentNode.Level + 1
});
foreach (TableSearchNode table in tables)
{
parentNode.AddChildNode(sourceColumn, table);
if (!table.Table.Equals(targetTable))
{
FindLinkingTables(sourceLookup, table, targetTable, maxSearchDepth);
}
else
{
table.NotifySeachResult(true);
}
}
}
}
}
<强> [编辑] 强>
另外,为了加速剩余的复杂LINQ查询,您可以准备另一个ILookup
:
ILookup<string, TableColumns> sourceColumnLookup = sourceLlist
.ToLookup(t => t.Column.Substring(1));
//...
private void FindLinkingTables(
ILookup<string, TableColumns> sourceLookup,
ILookup<string, TableColumns> sourceColumnLookup,
TableSearchNode parentNode, string targetTable, int maxSearchDepth)
{
if (parentNode.Level >= maxSearchDepth) return;
var tableColumns = sourceLookup[parentNode.Table].Select(x => x.Column);
foreach (string sourceColumn in tableColumns)
{
string shortName = sourceColumn.Substring(1);
var tables = sourceColumnLookup[shortName]
.Where(tableColumn => !tableColumn.Table.Equals(parentNode.Table)
&& !parentNode.AncenstoryReversed.Contains(tableColumn.Table))
.Select(
x => new TableSearchNode
{
Table = x.Table,
Column = x.Column,
Level = parentNode.Level + 1
});
foreach (TableSearchNode table in tables)
{
parentNode.AddChildNode(sourceColumn, table);
if (!table.Table.Equals(targetTable))
{
FindLinkingTables(sourceLookup, sourceColumnLookup, table, targetTable, maxSearchDepth);
}
else
{
table.NotifySeachResult(true);
}
}
}
}
我已检查过您的Ancestory
财产。如果IEnumerable<string>
足以满足您的需求,请检查此实施:
public IEnumerable<string> AncenstoryEnum
{
get { return AncenstoryReversed.Reverse(); }
}
public IEnumerable<string> AncenstoryReversed
{
get
{
TableSearchNode ancestor = ParentNode;
while (ancestor != null)
{
yield return ancestor.tbl;
ancestor = ancestor.ParentNode;
}
}
}
答案 1 :(得分:2)
我设法将您的FindLinkingTables
代码重构为:
private void FindLinkingTables(
List<TableColumns> sourceList, TableSearchNode parentNode,
string targetTable, int maxSearchDepth)
{
if (parentNode.Level < maxSearchDepth)
{
var sames = sourceList.Where(w => w.Table == parentNode.Table);
var query =
from x in sames
join y in sames
on x.Column.Substring(1) equals y.Column.Substring(1)
where !parentNode.Ancenstory.Contains(y.Table)
select new TableSearchNode
{
Table = x.Table,
Column = x.Column,
Level = parentNode.Level + 1
};
foreach (TableSearchNode z in query)
{
parentNode.AddChildNode(z.Column, z);
if (z.Table != targetTable)
{
FindLinkingTables(sourceList, z, targetTable, maxSearchDepth);
}
else
{
z.NotifySeachResult(true);
}
}
}
}
在我看来,查询的where !parentNode.Ancenstory.Contains(y.Table)
部分中的逻辑是有缺陷的。我想你需要在这里重新考虑你的搜索操作,看看你想出了什么。
答案 2 :(得分:2)
有一些事情让我看到这个源方法:
在您的Where
条款中,您拨打了parentNode.Ancenstory
的电话;这本身就有对数运行时间,然后你在它返回的.Contains
上调用List<string>
,这是另一个对数调用(它是线性的,但列表的对数为元件)。
你在这里做的是检查图表中的周期。通过向TableColumns.Table
添加字段可以使这些费用保持不变,该字段存储有关算法处理Table
的方式的信息(或者,您可以使用Dictionary<Table, int>
,以避免添加字段到对象)。通常,在DFS算法中,此字段为白色,灰色或黑色 - 白色表示未处理(您之前未见Table
),灰色表示当前Table
的祖先正在处理,当你处理Table
及其所有孩子的时候,正在处理黑色。要更新代码以执行此操作,它看起来像:
foreach (string sourceColumn in tableColumns)
{
string shortName = sourceColumn.Substring(1);
IEnumerable<TableSearchNode> tables =
sourceList.Where(x => x.Column[0].Equals(shortName) &&
x.Color == White)
.Select(x => new TableSearchNode
{
Table = x.Table,
Column = x.Column,
Level = parentNode.Level + 1
});
foreach (TableSearchNode table in tables)
{
parentNode.AddChildNode(sourceColumn, table);
table.Color = Grey;
if (!table.Table.Equals(targetTable))
{
FindLinkingTables(sourceList, table, targetTable, maxSearchDepth);
}
else
{
table.NotifySeachResult(true);
}
table.Color = Black;
}
}
如上所述,您的内存不足。最简单的解决方法是删除递归调用(充当隐式堆栈)并将其替换为显式Stack
数据结构,删除递归。另外,这会将递归更改为循环,C#在优化时更好。
private void FindLinkingTables(List<TableColumns> sourceList, TableSearchNode root, string targetTable, int maxSearchDepth)
{
Stack<TableSearchNode> stack = new Stack<TableSearchNode>();
TableSearchNode current;
stack.Push(root);
while (stack.Count > 0 && stack.Count < maxSearchDepth)
{
current = stack.Pop();
var tableColumns = sourceList.Where(x => x.Table.Equals(current.Table))
.Select(x => x.Column);
foreach (string sourceColumn in tableColumns)
{
string shortName = sourceColumn.Substring(1);
IEnumerable<TableSearchNode> tables =
sourceList.Where(x => x.Column[0].Equals(shortName) &&
x.Color == White)
.Select(x => new TableSearchNode
{
Table = x.Table,
Column = x.Column,
Level = current.Level + 1
});
foreach (TableSearchNode table in tables)
{
current.AddChildNode(sourceColumn, table);
if (!table.Table.Equals(targetTable))
{
table.Color = Grey;
stack.Push(table);
}
else
{
// you could go ahead and construct the ancestry list here using the stack
table.NotifySeachResult(true);
return;
}
}
}
current.Color = Black;
}
}
最后,我们不知道Table.Equals
的代价是多少,但如果比较深,那么可能会给内循环增加大量的运行时间。
答案 3 :(得分:2)
好的,这是一个基本上放弃了你发布的所有代码的答案。
首先,您应该使用List<TableColumns>
并将它们哈希到可以编入索引的内容中,而不必遍历整个列表。
为此,我写了一个名为TableColumnIndexer
的课程:
class TableColumnIndexer
{
Dictionary<string, HashSet<string>> tables = new Dictionary<string, HashSet<string>>();
public void Add(string tableName, string columnName)
{
this.Add(new TableColumns { Table = tableName, Column = columnName });
}
public void Add(TableColumns tableColumns)
{
if(! tables.ContainsKey(tableColumns.Table))
{
tables.Add(tableColumns.Table, new HashSet<string>());
}
tables[tableColumns.Table].Add(tableColumns.Column);
}
// .... More code to follow
现在,一旦将所有表/列值注入此索引类,就可以调用递归方法来检索两个表之间的最短祖先链接。这里的实现有点草率,但为了清晰起见,这是为了清晰起见:
// .... continuation of TableColumnIndexer class
public List<string> GetShortestAncestry(string parentName, string targetName, int maxDepth)
{
return GetSortestAncestryR(parentName, targetName, maxDepth - 1, 0, new Dictionary<string,int>());
}
private List<string> GetSortestAncestryR(string currentName, string targetName, int maxDepth, int currentDepth, Dictionary<string, int> vistedTables)
{
// Check if we have visited this table before
if (!vistedTables.ContainsKey(currentName))
vistedTables.Add(currentName, currentDepth);
// Make sure we have not visited this table at a shallower depth before
if (vistedTables[currentName] < currentDepth)
return null;
else
vistedTables[currentName] = currentDepth;
if (currentDepth <= maxDepth)
{
List<string> result = new List<string>();
// First check if the current table contains a reference to the target table
if (tables[currentName].Contains(targetName))
{
result.Add(currentName);
result.Add(targetName);
return result;
}
// If not try to see if any of the children tables have the target table
else
{
List<string> bestResult = null;
int bestDepth = int.MaxValue;
foreach (string childTable in tables[currentName])
{
var tempResult = GetSortestAncestryR(childTable, targetName, maxDepth, currentDepth + 1, vistedTables);
// Keep only the shortest path found to the target table
if (tempResult != null && tempResult.Count < bestDepth)
{
bestDepth = tempResult.Count;
bestResult = tempResult;
}
}
// Take the best link we found and add it to the result list
if (bestDepth < int.MaxValue && bestResult != null)
{
result.Add(currentName);
result.AddRange(bestResult);
return result;
}
// If we did not find any result, return nothing
else
{
return null;
}
}
}
else
{
return null;
}
}
}
现在所有这些代码只是一个(有点冗长的)最短路径算法的实现,它允许源表和目标表之间的循环路径和多个路径。请注意,如果两个表之间有两条具有相同深度的路径,则算法将只选择一个(并且不一定是可预测的)。