我想插入一批文件,其中一些已经存在于集合中。所以我想要的是要么忽略它们,要么对我来说更好的解决方案就是我想要记录哪个文档是重复的异常,并且如果可能的话继续插入下一个文档。
我看到了几个类似的问题,但没有人解决这个问题。
MongoDB Bulk Insert Ignore Duplicate
MongoDB: how to insert document without repeat
我已经创建了自己的哈希属性,因为我的文档的唯一键是多个键,所以我累积它们,然后为它计算哈希值。
我的代码看起来像这样:
const string connectionString = "mongodb://127.0.0.1/localdb";
var client = new MongoClient(connectionString);
_database = client.GetDatabase("localdb");
var collection = _database.GetCollection<BsonDocument>("Sales");
StringBuilder customValue;
foreach (var data in dataCollectionDict)
{
customValue = new StringBuilder();
customValue.Append(data["col1"]);
customValue.Append(data["col2"]);
customValue.Append(data["col3"]);
customValue.Append(data["col4"]);
customValue.Append(data["col5"]);
customValue.Append(data["col6"]);
data.AddRange(new BsonDocument("HashMultipleKey", SHA256Func(customValue.ToString())));
}
await collection.Indexes.CreateOneAsync(new BsonDocument("HashMultipleKey", 1), new CreateIndexOptions() { Unique = true, Sparse = true ,});
await collection.InsertManyAsync(dataCollectionDict);
任何帮助将不胜感激。
答案 0 :(得分:1)
所以这是我发现的工作,不确定这是否是最佳解决方案,我很乐意听到你是否有更好的方法。
try
{
await collection.InsertManyAsync(dataCollectionDict);
}
catch (Exception ex)
{
ApplicationInsights.Instance.TrackException(ex);
InsertSingleDocuments(dataCollectionDict,collection, dataCollectionQueueMessage);
}
}
private static void InsertSingleDocuments(List<BsonDocument> dataCollectionDict, IMongoCollection<BsonDocument> collection
,DataCollectionQueueMessage dataCollectionQueueMessage)
{
ApplicationInsights.Instance.TrackEvent("About to start insert individual docuemnts and to find the duplicate one");
foreach (var data in dataCollectionDict)
{
try
{
collection.InsertOne(data);
}
catch (Exception ex)
{
ApplicationInsights.Instance.TrackException(ex,new Dictionary<string, string>() {
{
"Error Message","Duplicate document was detected, therefore ignoring this document and continuing to insert the next docuemnt"
}, {
"FilePath",dataCollectionQueueMessage.FilePath
}}
);
}
}
}