并行数据处理会混淆一些信息

时间:2017-04-26 12:07:08

标签: c# mysql multithreading concurrency parallel-processing

我正在尝试使用一些并行数据处理和MySQL的开发来开发应用程序。这是我遇到问题的一段代码

    public ConcurrentDictionary<string, Info> GetDatabaseForCurrentDay(System.DateTime day)
    {
        string[] date = day.ToShortDateString().Split('.');
        string sqlQuery = "SELECT * FROM testtable WHERE Date ='" + date[2] + "-" + date[1] + "-" + date[0] + "';";
        ConcurrentDictionary<string, Info> info = new ConcurrentDictionary<string, Info>();
        Info[] dayInfo = null;
        Parallel.ForEach(ReadData(ConnectionString, sqlQuery), data =>
        {
            int num = 2;
            string[] dataPieces = data.Split(new char[] { ',' }, num);
            FileHelpers.FileHelperEngine<Info> engine = new FileHelpers.FileHelperEngine<Info>();
            dayInfo = engine.ReadString(dataPieces[1], int.MaxValue);
            info.TryAdd(dataPieces[0], dayInfo[0]);
        });       
        return info;
    }

除了这个片段之外,函数ReadData(ConnectionString, sqlQuery)也值得一提,因为它为循环Parallel.ForEach提供了一个参数。

    public IEnumerable<string> ReadData(string connectionString, string queryString)
    {
        using (MySqlConnection conn = new MySqlConnection(connectionString))
        {
            using (MySqlCommand comm = new MySqlCommand(queryString, conn))
            {
                conn.Open();
                string command2 = "USE testdatabase;";
                MySqlCommand commandUse = new MySqlCommand(command2, conn);
                commandUse.ExecuteNonQuery();
                comm.CommandTimeout = 0;
                MySqlDataReader reader = comm.ExecuteReader();
                if (reader.HasRows)
                {
                    while (reader.Read())
                    {
                        StringBuilder sb = new StringBuilder();
                        sb.Append(reader.GetString(0) + ",");
                        sb.Append(reader.GetDateTime(1).ToString("yyyy-MM-dd") + ",");
                        sb.Append(reader.GetDouble(2).ToString().Replace(',', '.') + ",");
                        sb.Append(reader.GetDouble(3).ToString().Replace(',', '.') + ",");
                        sb.Append(reader.GetDouble(4).ToString().Replace(',', '.') + ",");
                        sb.Append(reader.GetDouble(5).ToString().Replace(',', '.') + ",");
                        sb.Append(reader.GetUInt64(6) + ",");
                        sb.Append(reader.GetDouble(7).ToString().Replace(',', '.'));
                        yield return sb.ToString();
                    }
                }
            }
        }
    }

现在,让我们回到问题所在。代码编译并运行,但返回的结果不正确。我注意到ConcurrentDictionary包含具有错误值的键 - 简而言之,info.TryAdd(dataPieces[0], dayInfo[0])可能会从一个线程插入一个键,而另一个线程中的值也会插入,因此数据可能已损坏。我知道这种行为是并行处理的挫折,但是这种方法不能省略。我尝试了不同的方法来解决这个问题,但没有任何效果,数据仍然是错误的。是否有解决此问题的方法可以保持此代码的执行速度并保存数据?

1 个答案:

答案 0 :(得分:3)

您需要将dayInfo移动到并行for循环中。基本上这是一个共享变量,它会不断地被每个任务写入,从而为您提供垃圾结果。如果你将它放入委托中,那么对于每次迭代它将是一个不同的私有变量而不会被破坏:

// Info[] dayInfo = null;   <--Remove this
Parallel.ForEach(ReadData(ConnectionString, sqlQuery), data =>
{
    int num = 2;
    string[] dataPieces = data.Split(new char[] { ',' }, num);
    FileHelpers.FileHelperEngine<Info> engine = new FileHelpers.FileHelperEngine<Info>();

    //declare dayInfo locally within this scope instead 
    var dayInfo = engine.ReadString(dataPieces[1], int.MaxValue);
    info.TryAdd(dataPieces[0], dayInfo[0]);
});