Question

可能重复：
Remove duplicates from a List<T> in C#

我有一个如下所示的列表（如此大的电子邮件列表）：
来源清单：

item 0 : jumper@yahoo.com|32432  
item 1 : goodzila@yahoo.com|32432|test23  
item 2 : alibaba@yahoo.com|32432|test65  
item 3 : blabla@yahoo.com|32432|test32

每个项目的重要部分是电子邮件地址，其他部分（用管道分隔并不重要）但我想将它们保留在最终列表中。
正如我所说，我的名单是大的，我认为不建议使用另一个列表。

如何在不使用LINQ的情况下删除列表中的重复电子邮件（整个项目）？
我的代码如下：

private void WorkOnFile(UploadedFile file, string filePath)
{
    File.SetAttributes(filePath, FileAttributes.Archive);

    FileSecurity fSecurity = File.GetAccessControl(filePath);
    fSecurity.AddAccessRule(new FileSystemAccessRule(@"Everyone",
                                                    FileSystemRights.FullControl,
                                                    AccessControlType.Allow));
    File.SetAccessControl(filePath, fSecurity);

    string[] lines = File.ReadAllLines(filePath);
    List<string> list_lines = new List<string>(lines);
    var new_lines = list_lines.Select(line => string.Join("|", line.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries)));
    List<string> new_list_lines = new List<string>(new_lines);
    int Duplicate_Count = 0;
    RemoveDuplicates(ref new_list_lines, ref Duplicate_Count);
    File.WriteAllLines(filePath, new_list_lines.ToArray());
}

private void RemoveDuplicates(ref List<string> list_lines, ref int Duplicate_Count)
{
    char[] splitter = { '|' };
    list_lines.ForEach(delegate(string line)
    {
        // ??
    });
}

编辑：
该列表中的一些重复的电子邮件地址具有不同的部分 - ＆gt;
我能怎样对待他们：
意思是

goodzila@yahoo.com|32432|test23   
and   
goodzila@yahoo.com|asdsa|324234

提前致谢。

Answer 1

说你有可能重复的列表：

List<string> emailList ....

然后唯一列表就是该列表的集合：

HashSet<string> unique = new HashSet<string>( emailList )

Answer 2

private void RemoveDuplicates(ref List<string> list_lines, ref int Duplicate_Count)
{
    Duplicate_Count = 0;
    List<string> list_lines2 = new List<string>();
    HashSet<string> hash = new HashSet<string>();

    foreach (string line in list_lines)
    {
        string[] split = line.Split('|');
        string firstPart = split.Length > 0 ? split[0] : string.Empty;

        if (hash.Add(firstPart)) 
        {
            list_lines2.Add(line);
        }
        else
        {
            Duplicate_Count++;
        }
    }

    list_lines = list_lines2;
}

Answer 3

最简单的方法是遍历文件中的行并将它们添加到HashSet中。 HashSets不会插入重复的条目，也不会生成异常。最后，您将拥有唯一的项目列表，并且不会为任何重复项生成异常。

Answer 4

1 - 删除管道分隔的字符串（创建与其代表的数据对应的dto类）

2 - 您要应用哪条规则来选择具有相同ID的两个对象？

Answer 5

或许这段代码对你有用:) 它使用的方法与@xanatos answer

中使用的方法相同

string[] lines= File.ReadAllLines(filePath);
Dictionary<string, string> items;

foreach (var line in lines )
{
    var key = line.Split('|').ElementAt(0);
    if (!items.ContainsKey(key))
        items.Add(key, line);
}
List<string> list_lines = items.Values.ToList();

Answer 6

首先，我建议您通过流加载文件。然后，创建一个表示行的类型并将它们加载到HashSet中（for 性能考虑因素）。

看看（我已经删除了一些代码以简化它）：

public struct LineType
{
    public string Email { get; set; }
    public string Others { get; set; }

    public override bool Equals(object obj)
    {
        return this.Email.Equals(((LineType)obj).Email);
    }
}
private static void WorkOnFile(string filePath)
{
    StreamReader stream = File.OpenText(filePath);

    HashSet<LineType> hashSet = new HashSet<LineType>();

    while (true)
    {
        string line = stream.ReadLine();
        if (line == null)
            break;

        string new_line = string.Join("|", line.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries));


        LineType lineType = new LineType()
        {
            Email = new_line.Split('|')[3],
            Others = new_line
        };

        if (!hashSet.Contains(lineType))
            hashSet.Add(lineType);
    }
}

如何从没有LINQ？</string>的List <string>中删除重复项

6 个答案: