Question

我正在努力实现以下目标：

从SQL DB获取数据。
将数据传递给具有第三方方法的PerformStuff方法 MethodforResponse（它检查输入并提供回复）
将响应（xml）保存回SQL DB。

下面是示例代码。性能不好，如果在数据库中有1000,000条记录，则非常慢。

有更好的方法吗？有任何想法或提示可以使它变得更好。

请帮助。

using thirdpartylib;
 public class Program
    {

        static void Main(string[] args)
        {
            var response = PerformStuff();
            Save(response);


        }

        public class TestRequest
        {
            public int col1 { get; set; }
            public bool col2 { get; set; }
            public string col3 { get; set; }
            public bool col4 { get; set; }

            public string col5 { get; set; }
            public bool col6 { get; set; }
            public string col7 { get; set; }

        }
        public class TestResponse
        {
            public int col1 { get; set; }
            public string col2 { get; set; }
            public string col3 { get; set; }
            public int col4 { get; set; }

        }
        public TestRequest GetDataId(int id)
        {
            TestRequest testReq = null;
            try
            {
                SqlCommand cmd = DB.GetSqlCommand("proc_name");
                cmd.AddInSqlParam("@Id", SqlDbType.Int, id);
                SqlDataReader dr = new SqlDataReader(DB.GetDataReader(cmd));
                while (dr.Read())
                {
                    testReq = new TestRequest();

                    testReq.col1 = dr.GetInt32("col1");
                    testReq.col2 = dr.GetBoolean("col2");
                    testReq.col3 = dr.GetString("col3");
                    testReq.col4 = dr.GetBoolean("col4");
                    testReq.col5 = dr.GetString("col5");
                    testReq.col6 = dr.GetBoolean("col6");
                    testReq.col7 = dr.GetString("col7");



                }
                dr.Close();
            }

            catch (Exception ex)
            {
                throw;
            }
            return testReq;

        }
        public static TestResponse PerformStuff()
        {
            var response = new TestResponse();
            //give ids in list
            var ids = thirdpartylib.Methodforid()


            foreach (int id in ids)
            {

                var request = GetDataId(id);


                var output = thirdpartylib.MethodforResponse(request);

                foreach (var data in output.Elements())
                {
                    response.col4 = Convert.ToInt32(data.Id().Class());
                    response.col2 = data.Id().Name().ToString();

                }
            }
            //request details
            response.col1 = request.col1;
            response.col2 = request.col2;
            response.col3 = request.col3;

            return response;
        }

        public static void Save(TestResponse response)
        {

            var Sb = new StringBuilder();
            try
            {
                Sb.Append("<ROOT>");
                Sb.Append("<id");
                Sb.Append(" col1='" + response.col1 + "'");
                Sb.Append(" col2='" + response.col2 + "'");
                Sb.Append(" col3='" + response.col3 + "'");
                Sb.Append(" col4='" + response.col4 + "'");

                Sb.Append("></Id>");
                Sb.Append("</ROOT>");
                var cmd = DB.GetSqlCommand("saveproc");
                cmd.AddInSqlParam("@Data", SqlDbType.VarChar, Sb.ToString());
                DB.ExecuteNoQuery(cmd);

            }
            catch (Exception ex)
            {

                throw;
            }
        }

    }

谢谢！

Answer 1

您的问题非常广泛，方法PerformStuff()基本上会很慢，因为它需要O(n) * db_lookup_time才能进行另一次输出迭代。因此，在我看来，您正在以错误的方式解决此问题。

使用数据库查询语言来优化数据遍历。因此，按照ID进行迭代，然后检查值，这会产生最慢的查找时间。

相反，利用SQL强大的查询语言并使用where id < 10 and value > 100之类的子句，因为您最终希望限制数据集的大小， C＃。

所以：

从数据库中只是读取您需要的最小数量数据
将此数据作为一个整体进行处理，可能对并行性有帮助。
在一个数据库连接中回写修改。

希望这会为您设定正确的方向。

Answer 2

我认为问题的根源在于您获取并逐条记录插入数据。没有可能对其进行优化。您通常需要更改方法。

您应该考虑以下解决方案： 1.通过一条命令将所有数据获取到数据库。 2.处理它。 3.使用类似BULK INSERT的技术，通过一个命令将其保存回数据库。请注意，BULK INSERT具有某些限制，因此请仔细阅读文档。

Answer 3

根据您的评论，从内存消耗到CPU使用率，您可以在解决方案中进行多项改进。

在数据库级别利用paging。不要一次获取所有记录，以避免在出现1+百万条记录的情况下发生内存泄漏和/或大量内存消耗，而是逐块地处理所有需要处理的事情。
由于不需要将XML保存到数据库中，因此可以选择将响应保存到文件中。将XML保存到文件中使您有机会将stream数据存储到本地磁盘上。
使用XmlSerializer可以代替您自己组装XML。 XmlSerializer与XmlWriter可以很好地结合使用，最终它可以与包括FileStream在内的任何stream一起使用。有一个thread，可以作为示例。

最后，PerformStuff方法不仅会更快，而且需要更少的资源（内存，CPU），最重要的是，您将能够轻松地限制程序的资源消耗（通过更改数据库页面的大小）。

Answer 4

观察：您的要求看起来与地图/缩小模式匹配。

如果ids返回的thirdpartylib.Methodforid()集合中的值相当密集，并且proc_name存储过程后面的表中的行数接近相同的项目数在ids集合中，您可以通过单个SQL查询（和多行结果集）来检索所需的所有记录，而不必一个个地检索它们。可能看起来像这样：

public static TestResponse PerformStuff()
{
    var response = new TestResponse();

    var idHash = new HashSet<int> (thirdpartylib.Methodforid());

    SqlCommand cmd = DB.GetSqlCommand("proc_name_for_all_ids");
    using (SqlDataReader dr = new SqlDataReader(DB.GetDataReader(cmd)) { 
        while (dr.Read()) {
            var id = dr.GetInt32("id");
            if (idHash.Contains(id)) {
                testReq = new TestRequest();

                testReq.col1 = dr.GetInt32("col1");
                testReq.col2 = dr.GetBoolean("col2");
                testReq.col3 = dr.GetString("col3");
                testReq.col4 = dr.GetBoolean("col4");
                testReq.col5 = dr.GetString("col5");
                testReq.col6 = dr.GetBoolean("col6");
                testReq.col7 = dr.GetString("col7");

                var output = thirdpartylib.MethodforResponse(request);
                foreach (var data in output.Elements())  {
                    response.col4 = Convert.ToInt32(data.Id().Class());
                    response.col2 = data.Id().Name().ToString();
                }
            } /* end if hash.Contains(id) */  
        }  /* end while dr.Read() */
    } /* end using() */
    return response;
}

为什么会更快？它减少了许多数据库查询，而是流向要处理的多行数据。这将比您的示例更有效。

为什么它不起作用？

如果必须按id产生的相同顺序处理thirdpartylib.Methodforid()值，则它将不起作用。
如果无法检索所有行，即没有可用的proc_name_for_all_ids存储过程，则将无法流式处理行。

如何有效地获取，处理和保存c＃中的巨大记录集？

4 个答案: