Question

我有一个.csv文件，其中有10万条记录，其中有五列。我正在逐行阅读并将其存储在远程数据库中。

以前，我遵循以绩效为导向的方法。我正在逐行读取.csv文件，在同一个事务中，我打开了与数据库的连接并关闭它。这带来了严重的性能开销。只写了10 000行，花了一个小时。

using (FileStream reader = File.OpenRead(@"C:\Data.csv")) 
            using (TextFieldParser parser = new TextFieldParser(reader))
            {
                parser.TrimWhiteSpace = true; // if you want
                parser.Delimiters = new[] { " " };
                parser.HasFieldsEnclosedInQuotes = true;

                while (!parser.EndOfData)
                {
                    //Open a connection to a database 
                    //Write the data from the .csv file line by line
                    //Close the connection
                 }
             }

现在我改变了方法。出于测试目的，我已经获得了一个包含10 000行的.csv文件，在阅读了所有10 000行后，我正在建立一个与数据库的连接并将其写入那里。

现在，唯一的问题是：我想读取前10 000行并写入，同样读取下一万行并写入，

using (FileStream reader = File.OpenRead(@"C:\Data.csv")) 
                using (TextFieldParser parser = new TextFieldParser(reader))

但以上两行将读取整个文件。我不想完全阅读它。有没有办法按每个10 000行的大块读取.csv文件块？

Answer 1

尝试下面的代码，它通过chunk

从csv chunk读取数据

 IEnumerable<DataTable> GetFileData(string sourceFileFullName)
    {            

        int chunkRowCount = 0;

        using (var sr = new StreamReader(sourceFileFullName))
        {
            string line = null;
            //Read and display lines from the file until the end of the file is reached.                
            while ((line = sr.ReadLine()) != null)
            {                                                  
               chunkRowCount++;
               var chunkDataTable = ; ////Code for filling datatable or whatever   

                if (chunkRowCount == 10000)
                {
                    chunkRowCount = 0;
                    yield return chunkDataTable;
                    chunkDataTable = null;
                }
            }
        }
        //return last set of data which less then chunk size
        if (null != chunkDataTable)                           
            yield return chunkDataTable;            
    }

以块的形式读取csv文件进行处理

1 个答案: