我有一个半个五十万个记录表,我需要找到重复项。所以我使用我创建的代码:
var dups2 = from m in mg_B
group m by new { m.Addr1, m.Addr2, m.City, m.State }
into g
where g.Count() > 1
select g;
此代码的问题在于它不会将addr1作为空字符串“”并且分别为NULL的2条记录重复。
基本上,当比较字段的空值和空值时,它会将它们视为不同,但我需要被视为相同。
我知道我可以浏览每一条记录并用“”替换空值但是我花了1分钟来完成4 000条记录。当有人点击按钮时,这将重复进行。
我发现了这个空的空字符串问题,因为我最初只用一些字段创建了一个类(该表有超过40个字段)。
List<CombineClass> mg = (from m in db.MG_Backup
where m.IsArchived == false
select new CombineClass { id = m.ID, name = m.Name, addr1 = string.IsNullOrEmpty(m.Addr1) ? "" : m.Addr1, addr2 = string.IsNullOrEmpty(m.Addr2) ? "" : m.Addr2, city = m.City, state = m.State }).ToList();
有什么想法吗?
答案 0 :(得分:2)
此版本与Linq-to-Sql / Linq-to-Entities
兼容var dups2 = from m in mg_B
group m by new
{
Addr1 = m.Addr1 ?? string.Empty,
Addr2 = m.Addr2 ?? string.Empty,
City = m.City ?? string.Empty,
State = m.State ?? string.Empty,
}
into g
where g.Count() > 1
select g;
生成的sql看起来有点像这样:
-- Parameters
DECLARE @p0 NVarChar(1000) = ''
DECLARE @p1 NVarChar(1000) = ''
DECLARE @p2 NVarChar(1000) = ''
DECLARE @p3 NVarChar(1000) = ''
DECLARE @p4 Int = 1
SELECT [t2].[value2] AS [Addr1], [t2].[value22] AS [Addr2], [t2].[value3] AS [City], [t2].[value3] AS [State]
FROM (
SELECT COUNT(*) AS [value], [t1].[value] AS [value2], [t1].[value2] AS [value22], [t1].[value3], [t1].[value4]
FROM (
SELECT COALESCE([t0].[Addr1],@p0) AS [value], COALESCE([t0].[Addr2],@p1) AS [value2], COALESCE([t0].[City],@p2) AS [value3], COALESCE([t0].[State],@p3) AS [value4]
FROM [SettingSystemNodes] AS [t0]
) AS [t1]
GROUP BY [t1].[value], [t1].[value2], [t1].[value3], [t1].[value4]
) AS [t2]
WHERE [t2].[value] > @p4
请注意,如果在查询之前将string.Empty
设置为局部变量,甚至是let
变量,则只有一个参数将用于空字符串。
答案 1 :(得分:0)
这是蛮力的方式:
var dups2 = from m in mg_B
group m by new {
Addr1 = (string.IsNullOrEmpty(m.Addr1) ? "" : m.Addr1),
Addr2 = (string.IsNullOrEmpty(m.Addr2) ? "" : m.Addr2),
City = (string.IsNullOrEmpty(m.City) ? "" : m.City ),
State = (string.IsNullOrEmpty(m.State) ? "" : m.State),
...
}
into g
where g.Count() > 1
select g;
如果您希望代码看起来更清洁,您可以在string
上使用扩展方法:
public static string EmptyForNull(this string s)
{
return string.IsNullOrEmpty(s) ? "" : s;
}
然后你的查询将是:
var dups2 = from m in mg_B
group m by new {
Addr1 = EmptyForNull(m.Addr1),
Addr2 = EmptyForNull(m.Addr2),
City = EmptyForNull(m.City),
State = EmptyForNull(m.State),
...
}
into g
where g.Count() > 1
select g;
但是,如果在SQL而不是Linq中完成,那么这可能会更快。