使用Parallelism写入MongoDB时出错

时间:2014-09-04 11:50:19

标签: c# mongodb mongodb-.net-driver parallel.foreach parallels

我在mongo中有一个包含子文档的集合,然后读取xml文件,它们将在MongoDB中记录。每个xml文件都是mongo中的文档。

我的班级

public class Header
{
    public Header()
    {
        Operation= new List<Operation>();
    }

    public ObjectId Id { get; set; }
    public Int64 Code1 {get; set;}
    public Int64 Code2 {get; set;}
    public string Name { get; set; }
    public List<Operation> Operations { get; set; }
}

public class Operation
{
    public Operation()
    {
        Itens = new List<Item>();
    }

    public string Value { get; set; }
    public List<Item> Item { get; set; }
}

public class Item
{
    public string Value { get; set; }
}

类中的标题,Codigo2 Code1用于在MongoDB中创建文档的索引。 Code1和Codigo2组成XML文件名,因为它们都在一个文件夹中,所以不可能重复。

在MongoDB中录制

使用以下代码在mongo am中录制:

var po = new ParallelOptions() { MaxDegreeOfParallelism = Environment.ProcessorCount > 1 ? Environment.ProcessorCount / 2 : 1 };

Parallel.ForEach(arquivos, po, (arquivo, state) =>
{

    MongoCollection<Header> collection = MongoConnect.GetHeader();

            try
            {
                var Header = new Header();
                Header.Name = @"Valor 1";
                Header.Code1 = arquivo.Name.Split('-').Count() > 1 ? Int64.Parse((arquivo.Name.Split('-')[1]).Replace(".", "")) : 0;
                Header.Code2 = arquivo.Name.Split('-').Count() > 1 ? Int64.Parse((arquivo.Name.Split('-')[2]).Replace(".siag", "")) : 0;

                var body = record.SelectSingleNode("body");
                if (body != null)
                {
                    string[] linhas = body.InnerText.Split(new String[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries);
                    foreach (var linha in linhas)
                    {
                        string conteudo = linha;
                        var operation = new Operation();
                        if (!conteudo.Contains("\t"))
                        {
                            string tipo = conteudo.Substring(0, conteudo.IndexOf(' ')).Trim();
                            string tabela = conteudo.Substring(0, conteudo.IndexOf("Quando:", System.StringComparison.Ordinal)).Trim();
                            operation.value = tabela;
                            conteudo = conteudo.Remove(0, (tabela + " Quando:").Length);

                            Header.Operations.Add(operation);
                        }
                        else
                        {
                            var item = new Item();
                            string[] campos = conteudo.Split('\t');
                            item.Value = campos[0];

                            Header.Operations.Last().Itens.Add(item);
                        }
                    }

                    try
                    {
                        collection.Save(Header);
                    }
                    catch (Exception ex)
                    {
                        //Duplicate Key error show here
                    }
                }
            }
            catch (Exception ex)
            {
                //Log Error Here
            }

});

注意:请不要考虑阅读文件,只是为了说明。

完全错误

Erro: WriteConcern detected an error ''. (Response was { "ok" : 1, "code" : 11000, "err" :
 "insertDocument :: caused by :: 11000 E11000 duplicate key error index:
 DB.Collection.$Codigo1_1_Codigo2_1  dup key: { : 359922397, : 1217185957 }", 
"n" : NumberLong(0) })

1 个答案:

答案 0 :(得分:2)

原因是:source code

显示创建了objectId:

static ObjectId()
    {
        __staticMachine = (GetMachineHash() + AppDomain.CurrentDomain.Id) & 0x00ffffff; // add AppDomain Id to ensure uniqueness across AppDomains
        __staticIncrement = (new Random()).Next();

        try
        {
            __staticPid = (short)GetCurrentProcessId(); // use low order two bytes only
        }
        catch (SecurityException)
        {
            __staticPid = 0;
        }
    }

但如果你跑:

  var po = new ParallelOptions() { MaxDegreeOfParallelism = Environment.ProcessorCount > 1 ? Environment.ProcessorCount / 2 : 1 };

        var items = new List<string>()
        {
            "Foo",
            "Bar"
        };
        Parallel.ForEach(items, po, (arquivo, state) =>
        {
            Console.WriteLine((new Random()).Next());
        });

你得到:

  1259271181
  1259271181

因为随机不能很好地平行化。您必须定义不使用ObjectId的Id。或者制作threadsafe

从我们的评论中我将创建Header类,如:

public class Header
{
   public Header()
   {
      Operation= new List<Operation>();
   }
   [BsonId]
   public Codes Id {get; set;}
   public Int64 Code2 {get; set;}
   public string Name { get; set; }
   public List<Operation> Operations { get; set; }
}
public class Codes {
   public Int64 Code1 {get; set;}
   public Int64 Code2 {get; set;}
}