检查对象数组的唯一性

时间:2016-05-17 06:17:13

标签: c#

我正在从文件(例如CSV和Excel)中读取数据,并且需要确保文件中的每一行都是唯一的。

每一行都将表示为object[]。由于当前的架构,这不能改变。此数组中的每个对象可以是不同类型(decimalstringint等。)

文件可以看起来像这样:

foo    1      5 // Not unique
bar    1      5
bar    2      5
foo    1      5 // Not unique

文件可能有200,000多行和4-100列。

我现在的代码如下:

IList<object[]> rows = new List<object[]>();

using (var reader = _deliveryObjectReaderFactory.CreateReader(deliveryObject))
{
    // Read the row.
    while (reader.Read())
    {
        // Get the values from the file.
        var values = reader.GetValues();

        // Check uniqueness for row
        foreach (var row in rows)
        {
            bool rowsAreDifferent = false;

            // Check uniqueness for column.
            for (int i = 0; i < row.Length; i++)
            {
                var earlierValue = row[i];
                var newValue = values[i];
                if (earlierValue.ToString() != newValue.ToString())
                {
                    rowsAreDifferent = true;
                    break;
                }
            }
            if(!rowsAreDifferent)
                throw new Exception("Rows are not unique");
        }
        rows.Add(values);
    }
}

所以,我的问题,这可以更有效地完成吗?比如使用哈希,并检查哈希的唯一性呢?

1 个答案:

答案 0 :(得分:4)

您可以将 task_path GET /tasks MyApp.TaskController :index task_path GET /tasks/:id/edit MyApp.TaskController :edit task_path GET /tasks/new MyApp.TaskController :new task_path GET /tasks/:id MyApp.TaskController :show task_path POST /tasks MyApp.TaskController :create task_path PATCH /tasks/:id MyApp.TaskController :update PUT /tasks/:id MyApp.TaskController :update task_path DELETE /tasks/:id MyApp.TaskController :delete task_task_path GET /tasks/:task_id/implement MyApp.TaskController :implement 与自定义HashSet<object[]>一起使用:

IEqualityComparer<object[]>

HashSet<object[]> rows = new HashSet<object[]>(new MyComparer()); while (reader.Read()) { // Get the values from the file. var values = reader.GetValues(); if (!rows.Add(values)) throw new Exception("Rows are not unique"); } 可以这样实现:

MyComparer

我不完全确定public class MyComparer : IEqualityComparer<object[]> { public bool Equals(object[] x, object[] y) { if (ReferenceEquals(x, y)) return true; if (ReferenceEquals(x, null) || ReferenceEquals(y, null) || x.Length != y.Length) return false; return x.Zip(y, (a, b) => a == b).All(c => c); } public int GetHashCode(object[] obj) { unchecked { // this returns 0 if obj is null // otherwise it combines the hashes of all elements // like hash = (hash * 397) ^ nextHash // if an array element is null its hash is assumed as 0 // (this is the ReSharper suggestion for GetHashCode implementations) return obj?.Aggregate(0, (hash, o) => (hash * 397) ^ (o?.GetHashCode() ?? 0)) ?? 0; } } } 部分是否适用于所有类型。