我有需要解析的1M html文件,然后将提取的信息插入到我的sql server中。由于我解析出的对象之间的关系
,每个解析出的信息最终会出现在多个表中我现在正在使用Entity Framework来执行此操作,但是将我的每条信息添加到EF上下文中的正确对象需要很长时间而且效率不高!我需要更快,特别是我有很多文件要处理。
什么是并行解析大量文件并将其插入SQL服务器中的快捷方式,其中您添加的项目具有关系?
此外,还有更好的技术吗?喜欢Informatica?
答案 0 :(得分:0)
我认为SqlBulkCopy Class将是这种情况下的最佳选择。
您可以在SqlBulkCopy
类周围创建一个通用包装器,这样您就可以在任何实体上使用SqlBulkCopy
。下面是LINQ-to-SQL的包装器,但同样的想法将适用于Entity Framework,假设您的实体一对一地映射到表。
public void BulkInsert<TBusinessObject>(IEnumerable<TBusinessObject> entities, int timeoutInSeconds)
where TBusinessObject : class, IBusinessObject
{
AssertUtilities.ArgumentAllNotNull(entities, "entities");
AssertUtilities.ArgumentNotNegative(timeoutInSeconds, "timeoutInSeconds");
var metaTable = Mapping.GetTable(typeof(TBusinessObject));
if (metaTable == null)
throw new DataAccessException("MetaTable is not found.");
var insertDataMembers = metaTable.RowType.PersistentDataMembers
.Where(arg => !arg.IsDbGenerated)
.OrderBy(arg => arg.Ordinal)
.ToList();
using (var dataTable = new DataTable())
{
dataTable.Locale = CultureInfo.InvariantCulture;
var dataColumns = insertDataMembers
.Select(arg => new DataColumn(arg.MappedName))
.ToArray();
dataTable.Columns.AddRange(dataColumns);
foreach (var entity in entities)
{
var itemArray = insertDataMembers
.Select(arg => arg.StorageAccessor.GetBoxedValue(entity))
.ToArray();
dataTable.Rows.Add(itemArray);
}
try
{
if (Connection.State != ConnectionState.Open)
Connection.Open();
var sqlConnection = (SqlConnection)Connection;
var sqlTransaction = (SqlTransaction)Transaction;
using (var bulkCopy = new SqlBulkCopy(sqlConnection, SqlBulkCopyOptions.Default, sqlTransaction))
{
bulkCopy.BulkCopyTimeout = timeoutInSeconds;
bulkCopy.DestinationTableName = metaTable.TableName;
foreach (var dataColumn in dataColumns)
bulkCopy.ColumnMappings.Add(dataColumn.ColumnName, dataColumn.ColumnName);
bulkCopy.WriteToServer(dataTable);
}
}
catch (Exception exception)
{
throw DataAccessExceptionTranslator.Translate(exception);
}
}
}