我有一组表示excel文档中行的字段,我正在尝试加速处理一些LINQ语句。我可以使用SQL轻松表达解决方案,但是,我正在努力使用高性能的LINQ解决方案。这是我写的用来演示问题的查询。
DECLARE @T TABLE(RowNumber INT, FieldName NVARCHAR(25), FieldValue NVARCHAR(25))
INSERT @T( RowNumber, FieldName, FieldValue ) VALUES
(1,'F1','100'),(1,'F2','A'),(1,'F3','A'),
(2,'F1','200'),(2,'F2','A'),(2,'F3','A'),
(3,'F1','300'),(3,'F2','A'),(3,'F3','A'),
(4,'F1','400'),(4,'F2','A'),(4,'F3','A'),
(5,'F1','100'),(5,'F2','B'),(5,'F3','B'),
(6,'F1','100'),(6,'F2','C'),(6,'F3','B'),
(7,'F1','200'),(7,'F2','B'),(7,'F3','B'),
(8,'F1','100'),(8,'F2','A'),(8,'F3','A'),
(9,'F1','100'),(9,'F2','A'),(9,'F3','A'),
(10,'F1','400'),(10,'F2','A'),(10,'F3','A')
;WITH Flattended AS
(
SELECT
RowNumber,
F1=MAX(F1),
F2=MAX(F2),
F3=MAX(F3)
FROM
(
SELECT
RowNumber,
F1=CASE WHEN FieldName='F1' THEN FieldValue ELSE NULL END,
F2=CASE WHEN FieldName='F2' THEN FieldValue ELSE NULL END,
F3=CASE WHEN FieldName='F3' THEN FieldValue ELSE NULL END
FROM @T
)
AS A
GROUP BY RowNumber
),
FlattenedGrouped AS
(
SELECT F1, F2, F3
FROM Flattended
GROUP BY F1,F2,F3
HAVING COUNT(*)>1
)
SELECT
*
FROM
Flattended F
INNER JOIN FlattenedGrouped FG ON FG.F1=F.F1 AND FG.F2=F.F2 AND FG.F3=F.F3
在现实生活中,该系列如下所示:
public class Cell
{
public int RowNumber;
public string ColumnName;
public string ColumnValue;
}
public class ThisThing
{
public List<Cell> Cells;
}
我需要找到所有Cell RowNumbers,其中Cell FieldNames包含'F1','F2'和'F3',F1,F2和F3的值至少匹配另一个单元格。
在下面的excel这个场景中,将返回RowNumber 1和10。
RowNumber F1 F2 F3
1 100 A A
2 200 A A
3 300 A A
4 400 A A
5 500 A A
6 100 A B
7 600 A A
8 700 A A
9 800 A A
10 100 A A
以下是我正在处理的linq:
var _allFirstFieldMatches = properties.Where(p => p.Column == "F1").ToList()
.GroupBy(p => p.Value)
.Where(p => p.Count() > 1)
.Select(p => new
{
RowNumber = p.Min(o => o.RowNumber),
F1 = p.Min(o => o.Value)
});
var _allFirstAndSecondFieldMatches = properties
.Where(p => p.Column == "F2" && _allFirstFieldMatches.Any(p1 => p1.RowNumber == p.RowNumber)).ToList()
.GroupBy(p => p.Value)
.Where(p => p.Count() > 1)
.Select(p => new
{
RowNumber = p.Min(o => o.RowNumber),
F2 = p.Min(o => o.Value)
});
var _allFirstAndSecondAndThirsFieldMatches = properties
.Where(p => p.Column == "F3" && _allFirstAndSecondFieldMatches.Any(p1 => p1.RowNumber == p.RowNumber)).ToList()
.GroupBy(p => p.Value)
.Where(p => p.Count() > 1)
.Select(p => new
{
RowNumber = p.Min(o => o.RowNumber),
F3 = p.Min(o => o.Value)
});
第二次尝试:
var _field1Duplicates = (from o in properties
where o.Column.Equals("F1", StringComparison.InvariantCultureIgnoreCase)
group o by o.Value into g
select new
{
DuplicateCount = g.Count(),
Value = g.Key
})
.ToList().Where(p => p.DuplicateCount > 1);
var _dupField1Objects = (from o in properties
where o.Column.Equals("F2", StringComparison.InvariantCultureIgnoreCase)
join b in _field1Duplicates on o.Value equals b.Value
select new
{
RowNumber = o.RowNumber,
F1 = o.Value,
F2 = properties.Where(p => p.RowNumber == o.RowNumber && p.Column == "F2").FirstOrDefault().Value,
F3 = properties.Where(p => p.RowNumber == o.RowNumber && p.Column == "F3").FirstOrDefault().Value
}).ToList();
答案 0 :(得分:3)
您几乎可以将SQL查询翻译成LINQ:
// building data
var source = new ThisThing() { Cells = new List<Cell>() };
var f1 = new[] { "100", "200", "300", "400", "500", "100", "600", "700", "800", "100" };
var f2 = new[] { "A", "A", "A", "A", "A", "A", "A", "A", "A", "A" };
var f3 = new[] { "A", "A", "A", "A", "A", "B", "A", "A", "A", "A" };
for (int i = 1; i <= 10; i++) {
source.Cells.Add(new Cell() { RowNumber = i, ColumnName = "F1", ColumnValue = f1[i - 1] });
source.Cells.Add(new Cell() { RowNumber = i, ColumnName = "F2", ColumnValue = f2[i - 1] });
source.Cells.Add(new Cell() { RowNumber = i, ColumnName = "F3", ColumnValue = f3[i - 1] });
}
// normalize, same as in SQL query
// note we do not materialize query yet
var normalized = source.Cells.Select(c => new {
c.RowNumber,
F1 = c.ColumnName == "F1" ? c.ColumnValue : null,
F2 = c.ColumnName == "F2" ? c.ColumnValue : null,
F3 = c.ColumnName == "F3" ? c.ColumnValue : null
});
// flatten, again literal transaction
// still query is not executed
var flattened = normalized.GroupBy(c => c.RowNumber).Select(c => new {
RowNumber = c.Key,
F1 = c.Max(r => r.F1),
F2 = c.Max(r => r.F2),
F3 = c.Max(r => r.F3),
});
// again almost literal transaction
// at the end, query is finally executed with ToArray()
var result = flattened
.GroupBy(c => new { c.F1, c.F2, c.F3 })
.Where(c => c.Count() > 1)
.SelectMany(c => c.Select(r => r.RowNumber)).ToArray();