使用Distinct()方法时,删除空字符串值并忽略C#List中的大小写

时间:2017-02-20 16:50:04

标签: c# linq

我已设法删除列表中的大多数重复值,但我仍然有小写重复项,并且我想要删除列表中的空字符串值。

CategoriesList yield返回约1000条记录; noDuplicateCategories将此数字减少到20,删除大部分重复项:

var CSVCategories = from line in File.ReadAllLines(path).Skip(1)
                            let columns = line.Split(',')
                            select new Category
                            {
                                Name = columns[9]
                            };

            var CategoriesList = CSVCategories.ToList();

            var noDuplicateCategories = CategoriesList.Distinct(new CategoryComparer()).ToList();

这是我的Equalitycomparer接口的对象类重写方法:

class CategoryComparer : IEqualityComparer<Category>
{
    // Products are equal if their names and product numbers are equal.
    public bool Equals(Category x, Category y)
    {

        //Check whether the compared objects reference the same data.
        if (Object.ReferenceEquals(x, y)) return true;

        //Check whether any of the compared objects is null.
        if (Object.ReferenceEquals(x, null ) || Object.ReferenceEquals(y, null))
            return false;

        //Check whether the products' properties are equal.
        return string.Compare(x.Name, y.Name, true) == 0;
    }

    // If Equals() returns true for a pair of objects 
    // then GetHashCode() must return the same value for these objects.

    public int GetHashCode(Category category)
    {
        //Check whether the object is null
        if (Object.ReferenceEquals(category, null)) return 0;

        //Get hash code for the Name field if it is not null.
        int hashCategoryName = category.Name == null ? 0 : category.Name.GetHashCode();

        //Get hash code for the Code field.
        int hashCategoryCode = category.Name.GetHashCode();

        //Calculate the hash code for the product.
        return hashCategoryName;
    }

}

我需要更改此处以删除空字符串值并忽略大小写?

我的数据: enter image description here

1 个答案:

答案 0 :(得分:3)

如果您需要唯一的名称,为什么要处理Category对象。您可以在将名称转换为类别之前准备名称:

var categories = File.ReadLines(path).Skip(1)
           .Select(l => l.Split(new [] {','}, StringSplitOptions.RemoveEmptyEntries))
           .Where(parts => parts.Length >= 10)
           .Select(parts => parts[9].Trim())
           .Distinct(StringComparer.InvariantCultureIgnoreCase)
           .Select(s => new Category { Name = s });

当然,如果您非常确定文件中的数据是可靠的 - 没有空行,每行至少有10个部分,并且每个部分都没有空格,那么您可以简化查询

var categories = File.ReadLines(path).Skip(1)
           .Select(l => l.Split(',')[9])
           .Distinct(StringComparer.InvariantCultureIgnoreCase)
           .Select(s => new Category { Name = s });

注意:使用ReadLines代替ReadAllLines,以避免将所有文件内容转储到内存数组中。