我正在从文件(例如CSV和Excel)中读取数据,并且需要确保文件中的每一行都是唯一的。
每一行都将表示为object[]
。由于当前的架构,这不能改变。此数组中的每个对象可以是不同类型(decimal
,string
,int
等。)
文件可以看起来像这样:
foo 1 5 // Not unique
bar 1 5
bar 2 5
foo 1 5 // Not unique
文件可能有200,000多行和4-100列。
我现在的代码如下:
IList<object[]> rows = new List<object[]>();
using (var reader = _deliveryObjectReaderFactory.CreateReader(deliveryObject))
{
// Read the row.
while (reader.Read())
{
// Get the values from the file.
var values = reader.GetValues();
// Check uniqueness for row
foreach (var row in rows)
{
bool rowsAreDifferent = false;
// Check uniqueness for column.
for (int i = 0; i < row.Length; i++)
{
var earlierValue = row[i];
var newValue = values[i];
if (earlierValue.ToString() != newValue.ToString())
{
rowsAreDifferent = true;
break;
}
}
if(!rowsAreDifferent)
throw new Exception("Rows are not unique");
}
rows.Add(values);
}
}
所以,我的问题,这可以更有效地完成吗?比如使用哈希,并检查哈希的唯一性呢?
答案 0 :(得分:4)
您可以将 task_path GET /tasks MyApp.TaskController :index
task_path GET /tasks/:id/edit MyApp.TaskController :edit
task_path GET /tasks/new MyApp.TaskController :new
task_path GET /tasks/:id MyApp.TaskController :show
task_path POST /tasks MyApp.TaskController :create
task_path PATCH /tasks/:id MyApp.TaskController :update
PUT /tasks/:id MyApp.TaskController :update
task_path DELETE /tasks/:id MyApp.TaskController :delete
task_task_path GET /tasks/:task_id/implement MyApp.TaskController :implement
与自定义HashSet<object[]>
一起使用:
IEqualityComparer<object[]>
HashSet<object[]> rows = new HashSet<object[]>(new MyComparer());
while (reader.Read())
{
// Get the values from the file.
var values = reader.GetValues();
if (!rows.Add(values))
throw new Exception("Rows are not unique");
}
可以这样实现:
MyComparer
我不完全确定public class MyComparer : IEqualityComparer<object[]>
{
public bool Equals(object[] x, object[] y)
{
if (ReferenceEquals(x, y)) return true;
if (ReferenceEquals(x, null) || ReferenceEquals(y, null) || x.Length != y.Length) return false;
return x.Zip(y, (a, b) => a == b).All(c => c);
}
public int GetHashCode(object[] obj)
{
unchecked
{
// this returns 0 if obj is null
// otherwise it combines the hashes of all elements
// like hash = (hash * 397) ^ nextHash
// if an array element is null its hash is assumed as 0
// (this is the ReSharper suggestion for GetHashCode implementations)
return obj?.Aggregate(0, (hash, o) => (hash * 397) ^ (o?.GetHashCode() ?? 0)) ?? 0;
}
}
}
部分是否适用于所有类型。