我有一个存储过程,它给我一个结果集,该结果集由一个单列组成,其中包含数百万个未处理的行。我需要使用SqlBulkCopy将这些数据传输到另一台服务器,但是问题是我不能简单地执行以下操作:
using (var con = new SqlConnection(sqlConnectionStringSource))
{
using (var cmd = new SqlCommand("usp_GetUnprocessedData", con))
{
cmd.CommandType = CommandType.StoredProcedure;
con.Open();
using (var reader = cmd.ExecuteReader())
{
using (var sqlBulk = new SqlBulkCopy(sqlConnectionStringDestination))
{
sqlBulk.DestinationTableName = "BulkCopy";
sqlBulk.BulkCopyTimeout = 0;
sqlBulk.BatchSize = 200000;
sqlBulk.WriteToServer(reader);
}
}
}
}
因为根本不会处理数据。
就我而言,结果集的第n行如下所示:
value1_n,value2_n,value3_n
其中n
只是我用来区分各行的下标。
在我已命名为BulkCopy
的目标表中,我希望拥有:
╔══════════╦══════════╦══════════╗
║ Field1 ║ Field2 ║ Field3 ║
╠══════════╬══════════╬══════════╣
║ Value1_1 ║ Value2_1 ║ Value3_1 ║
║ Value1_2 ║ Value2_2 ║ Value3_2 ║
║ ... ║ ... ║ ... ║
║ Value1_n ║ Value2_n ║ Value3_n ║
╚══════════╩══════════╩══════════╝
有人告诉我要通过DataReader
接口的实现使用自定义IDataReader
,以便在SqlBulkCopy
使用{前从其中复制数据之前,逐行处理数据。 {1}}确保内存中只有少量数据,但是我不知道从哪里开始。
你能帮我吗?
答案 0 :(得分:1)
让我们扭转这个问题。无需查找通用解决方案,而是针对 this 问题创建一个特定的解决方案。花了几天的时间创建了IDataReader包装器后,我知道这不是很简单的事情。
我们知道有多少个字段,我们不关心结果中的任何其他字段。与其尝试正确实现IDataReader包装器,不如创建一个迭代器方法来拆分数据并以流方式一一返回记录。 FastMember's ObjectReader可以将IDataReader
接口包装在任何IEnumerable上:
class MyDTO
{
public string Field1{get;set;}
public string Field2{get;set;}
public string Field3{get;set;}
}
public IEnumerable<MyDTO> ReaderToStream(IDataReader reader)
{
while(reader.Read())
{
var line=reader.GetString(0);
var fields=String.Split(",",line);
yield return new MyDTO{Field1=fields[0];Field2=fields[1];Field3=fields[2]};
}
}
导入方法可以更改为:
using (var con = new SqlConnection(sqlConnectionStringSource))
{
...
using (var reader = cmd.ExecuteReader())
{
var recordStream=ReaderToStream(reader);
using(var rd=ObjectReader(recordStream))
using (var sqlBulk = new SqlBulkCopy(sqlConnectionStringDestination))
{
...
sqlBulk.WriteToServer(rd);
}
}
}
仅当SqlBulkCopy请求新记录时,迭代器才会调用Read()
,因此我们不要最终将所有内容加载到内存中。
还有IDataReader包装器
Resharper和Visual Studio 2019提供了通过委派包装类的调用来实现接口的功能。在Visual Studio 2019中,这称为Implement interface through 'field_name'
。
从此代码开始:
class ReaderWrapper:IDataReader
{
private readonly IDataReader _inner ;
public ReaderWrapper(IDataReader inner)
{
_inner = inner;
}
}
应用重构得到:
class ReaderWrapper:IDataReader
{
private readonly IDataReader _inner ;
public ReaderWrapper(IDataReader inner)
{
_inner = inner;
}
public object this[int i] => _inner[i];
public object this[string name] => _inner[name];
public int Depth => _inner.Depth;
public bool IsClosed => _inner.IsClosed;
public int RecordsAffected => _inner.RecordsAffected;
public int FieldCount => _inner.FieldCount;
public void Close() => _inner.Close();
public void Dispose() => _inner.Dispose();
public bool GetBoolean(int i) => _inner.GetBoolean(i);
public byte GetByte(int i) => _inner.GetByte(i);
public long GetBytes(int i, long fieldOffset, byte[] buffer, int bufferoffset, int length) => _inner.GetBytes(i, fieldOffset, buffer, bufferoffset, length);
public char GetChar(int i) => _inner.GetChar(i);
public long GetChars(int i, long fieldoffset, char[] buffer, int bufferoffset, int length) => _inner.GetChars(i, fieldoffset, buffer, bufferoffset, length);
public IDataReader GetData(int i) => _inner.GetData(i);
public string GetDataTypeName(int i) => _inner.GetDataTypeName(i);
public DateTime GetDateTime(int i) => _inner.GetDateTime(i);
public decimal GetDecimal(int i) => _inner.GetDecimal(i);
public double GetDouble(int i) => _inner.GetDouble(i);
public Type GetFieldType(int i) => _inner.GetFieldType(i);
public float GetFloat(int i) => _inner.GetFloat(i);
public Guid GetGuid(int i) => _inner.GetGuid(i);
public short GetInt16(int i) => _inner.GetInt16(i);
public int GetInt32(int i) => _inner.GetInt32(i);
public long GetInt64(int i) => _inner.GetInt64(i);
public string GetName(int i) => _inner.GetName(i);
public int GetOrdinal(string name) => _inner.GetOrdinal(name);
public DataTable GetSchemaTable() => _inner.GetSchemaTable();
public string GetString(int i) => _inner.GetString(i);
public object GetValue(int i) => _inner.GetValue(i);
public int GetValues(object[] values) => _inner.GetValues(values);
public bool IsDBNull(int i) => _inner.IsDBNull(i);
public bool NextResult() => _inner.NextResult();
public bool Read() => _inner.Read();
}
要创建拆分包装,我们需要将Read()
替换为我们自己的版本:
private string[] _values;
public bool Read()
{
var ok = _inner.Read();
if (ok)
{
//It *could be null*
if (_inner.IsDBNull(0))
{
//What to do? Store an empty array for now
_values = new string[0];
}
var fieldValue = _inner.GetString(0);
_values= fieldValue.Split(',');
}
return ok;
}
这将分割CSV值并将其存储在字符串中。这说明了为什么要稍微麻烦地实现包装器-我们需要处理很多事情,并决定在意外情况下(如null,空字符串等)进行处理。
之后,我们需要为SqlBulkCopy调用的方法添加自己的实现。明确地称为GetValue()
,FieldCount
也是如此。根据列映射类型,名称或顺序调用其他成员。
public int FieldCount => _values.Length;
public string GetString(int ordinal) => _values[ordinal];
public object GetValue(int ordinal)=> _values[ordinal];
//What if we have more values than expected?
public int GetValues(object[] values)
{
if (_values.Length > 0)
{
Array.Copy(_values, values,_values.Length);
return _values.Length;
}
return 0;
}
现在是“有趣的”部分。 GetName()
呢?可能是:
public string GetName(int ordinal) => $"Field{ordinal}";
GetOrdinal
吗?可以在名称映射中调用它。变得棘手:
public int GetOrdinal(string name) => int.Parse(name.Substring(5));
让我们希望这行得通。
我们还需要覆盖索引:
public object this[string name] => _values[GetOrdinal(name)];
public object this[int i] => _values[i];
我忘记了什么? ...仍然需要处理任意值的数字。需要处理空值。没有GetSchemaTable
,这可能意味着必须通过序数显式指定列映射。
一种快捷的IsDbNull
实现可能是:
public bool IsDBNull(int i)
{
//Covers the "null" case too, when `Length` is 0
if (i>_values.Length-1)
{
return true;
}
return _inner.IsDBNull(i);
}
GetSchemaTable
很难,因为我们实际上不知道每条记录中有多少个值。该表有20多个列,因此我宁愿不编写该代码,直到我看到需要它为止。
public DataTable GetSchemaTable() => throw new NotImplementedException();
Leave it as an excercise to the reader
PPS:默认界面实现,因为为什么不这样做
这可能是一个很好的解决方案,其中C#8的默认接口方法可用于创建包装的阅读器特征。默认情况下,请遵循包装的内部阅读器。这样可以消除实现中所有延迟的调用。
interface IReaderWrapper:IDataReader
{
//Gives access to the wrapped reader in the concrete classes
abstract IDataReader Inner();
override object this[int i] => Inner()[i];
override object this[string name] => Inner()[name];
override int Depth => Inner().Depth;
override bool IsClosed => Inner().IsClosed;
...
}
class SplitterWrapper:IReaderWrapper
{
private readonly IDataReader _inner ;
public SplitterWrapper(IDataReader inner)
{
_inner = inner;
}
IDataReader Inner()=> _inner;
string[] _values;
public object this[int i] => _values[i];
...
}
此功能在VS 2019随附的C#8编译器中不起作用,并以某种方式使Sharplab.io崩溃。不知道它是否会编译或是否真的需要覆盖。
答案 1 :(得分:0)
我发现以下代码项目:https://www.codeproject.com/script/Articles/ViewDownloads.aspx?aid=1095790。看来您必须获取csv数据并拆分为对象。我用下面的代码修改了代码项目。有很多类型尚未实现,您可能需要实现一些其他方法。也不确定结果值应该是哪种类型。
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.Data.SqlClient;
namespace ConsoleApplication108
{
class Program
{
static void Main(string[] args)
{
}
}
public class MyDataReader : IDataReader
{
private SqlConnection conn { get; set; }
private SqlCommand cmd { get; set; }
private SqlDataReader reader { get; set; }
private DataTable schemaTable { get; set; }
private string data { get; set; }
private object[] arrayData { get; set; }
private IEnumerator<object> m_dataEnumerator { get; set; }
public MyDataReader(string commandText, string connectionString, List<KeyValuePair<string, Type>> columns)
{
conn = new SqlConnection(connectionString);
conn.Open();
cmd = new SqlCommand(commandText, conn);
reader = cmd.ExecuteReader();
schemaTable = new DataTable();
foreach(KeyValuePair<string,Type> col in columns)
{
schemaTable.Columns.Add(col.Key, col.Value);
}
}
public Boolean NextResult()
{
return reader.Read();
}
public int RecordsAffected
{
get { return -1; }
}
public int Depth
{
get { return -1; }
}
public void Dispose()
{
Dispose(true);
GC.SuppressFinalize(this);
}
private void Dispose(bool disposing)
{
if (disposing)
{
if (m_dataEnumerator != null)
{
m_dataEnumerator.Dispose();
m_dataEnumerator = null;
}
}
}
public Boolean IsClosed {
get { return reader.IsClosed; }
}
public Boolean Read()
{
if (IsClosed)
{
throw new ObjectDisposedException(GetType().Name);
}
else
{
arrayData = reader.GetString(0).Split(new char[] { ',' }).ToArray();
}
return m_dataEnumerator.MoveNext();
}
public DataTable GetSchemaTable()
{
return schemaTable;
}
public void Close()
{
Dispose();
}
public object this[string name]
{
get { throw new NotImplementedException(); }
}
public object this[int i]
{
get { return arrayData[i]; }
}
public int FieldCount
{
get { return arrayData.Length; }
}
public bool IsDBNull(int i)
{
throw new NotImplementedException();
}
public bool GetBoolean(int i)
{
throw new NotImplementedException();
}
public byte GetByte(int i)
{
throw new NotImplementedException();
}
public long GetBytes(int i, long fieldOffset, byte[] buffer, int bufferoffset, int length)
{
throw new NotImplementedException();
}
public char GetChar(int i)
{
throw new NotImplementedException();
}
public long GetChars(int i, long fieldoffset, char[] buffer, int bufferoffset, int length)
{
throw new NotImplementedException();
}
public IDataReader GetData(int i)
{
throw new NotImplementedException();
}
public string GetDataTypeName(int i)
{
throw new NotImplementedException();
}
public DateTime GetDateTime(int i)
{
throw new NotImplementedException();
}
public decimal GetDecimal(int i)
{
throw new NotImplementedException();
}
public double GetDouble(int i)
{
throw new NotImplementedException();
}
public Type GetFieldType(int i)
{
throw new NotImplementedException();
}
public float GetFloat(int i)
{
throw new NotImplementedException();
}
public Guid GetGuid(int i)
{
throw new NotImplementedException();
}
public short GetInt16(int i)
{
throw new NotImplementedException();
}
public int GetInt32(int i)
{
throw new NotImplementedException();
}
public long GetInt64(int i)
{
throw new NotImplementedException();
}
public string GetName(int i)
{
throw new NotImplementedException();
}
public string GetString(int i)
{
throw new NotImplementedException();
}
public int GetValues(object[] values)
{
values = arrayData;
return arrayData.Length;
}
public int GetOrdinal(string name)
{
throw new NotImplementedException();
}
public object GetValue(int i)
{
return arrayData[i];
}
}
}