linq sql find duplicatelicates但是将null和空字符串视为相同

时间:2013-04-03 13:29:00

标签: c# linq null duplicates

我有一个半个五十万个记录表,我需要找到重复项。所以我使用我创建的代码:

var dups2 = from m in mg_B
    group m by new { m.Addr1, m.Addr2, m.City, m.State }
    into g
    where g.Count() > 1
    select g;

此代码的问题在于它不会将addr1作为空字符串“”并且分别为NULL的2条记录重复。

基本上,当比较字段的空值和空值时,它会将它们视为不同,但我需要被视为相同。

我知道我可以浏览每一条记录并用“”替换空值但是我花了1分钟来完成4 000条记录。当有人点击按钮时,这将重复进行。

我发现了这个空的空字符串问题,因为我最初只用一些字段创建了一个类(该表有超过40个字段)。

List<CombineClass> mg = (from m in db.MG_Backup
   where m.IsArchived == false
   select new CombineClass { id = m.ID, name = m.Name, addr1 = string.IsNullOrEmpty(m.Addr1) ? "" : m.Addr1, addr2 = string.IsNullOrEmpty(m.Addr2) ? "" : m.Addr2, city = m.City, state = m.State }).ToList(); 

有什么想法吗?

2 个答案:

答案 0 :(得分:2)

此版本与Linq-to-Sql / Linq-to-Entities

兼容
var dups2 = from m in mg_B
    group m by new 
    { 
        Addr1 = m.Addr1 ?? string.Empty, 
        Addr2 = m.Addr2 ?? string.Empty, 
        City  = m.City ?? string.Empty, 
        State = m.State ?? string.Empty,
    }
    into g
    where g.Count() > 1
    select g;

生成的sql看起来有点像这样:

-- Parameters
DECLARE @p0 NVarChar(1000) = ''
DECLARE @p1 NVarChar(1000) = ''
DECLARE @p2 NVarChar(1000) = ''
DECLARE @p3 NVarChar(1000) = ''
DECLARE @p4 Int = 1

SELECT [t2].[value2] AS [Addr1], [t2].[value22] AS [Addr2], [t2].[value3] AS [City], [t2].[value3] AS [State]
FROM (
    SELECT COUNT(*) AS [value], [t1].[value] AS [value2], [t1].[value2] AS [value22], [t1].[value3], [t1].[value4]
    FROM (
        SELECT COALESCE([t0].[Addr1],@p0) AS [value], COALESCE([t0].[Addr2],@p1) AS [value2], COALESCE([t0].[City],@p2) AS [value3], COALESCE([t0].[State],@p3) AS [value4]
        FROM [SettingSystemNodes] AS [t0]
        ) AS [t1]
    GROUP BY [t1].[value], [t1].[value2], [t1].[value3], [t1].[value4]
    ) AS [t2]
WHERE [t2].[value] > @p4

请注意,如果在查询之前将string.Empty设置为局部变量,甚至是let变量,则只有一个参数将用于空字符串。

答案 1 :(得分:0)

这是蛮力的方式:

var dups2 = from m in mg_B
    group m by new { 
        Addr1 = (string.IsNullOrEmpty(m.Addr1) ? "" : m.Addr1), 
        Addr2 = (string.IsNullOrEmpty(m.Addr2) ? "" : m.Addr2), 
        City  = (string.IsNullOrEmpty(m.City)  ? "" : m.City ), 
        State = (string.IsNullOrEmpty(m.State) ? "" : m.State),
        ...
        }
    into g
    where g.Count() > 1
    select g;

如果您希望代码看起来更清洁,您可以在string上使用扩展方法:

public static string EmptyForNull(this string s)
{
    return string.IsNullOrEmpty(s) ? "" : s;
}

然后你的查询将是:

var dups2 = from m in mg_B
    group m by new { 
        Addr1 = EmptyForNull(m.Addr1), 
        Addr2 = EmptyForNull(m.Addr2), 
        City  = EmptyForNull(m.City), 
        State = EmptyForNull(m.State),
        ...
        }
    into g
    where g.Count() > 1
    select g;

但是,如果在SQL而不是Linq中完成,那么这可能会更快。