寻求一种成本较低的解决方案来匹配两个表的

时间:2015-07-30 02:59:22

标签: c# sql asp.net-mvc entity-framework csv

我正在构建的应用程序允许用户上载包含多行和多列数据的.csv文件。每行包含唯一的varchar Id。这将最终填充现有SQL表的字段,其中存在匹配的Id。

第1步:我正在使用LinqToCsv和foreach循环将.csv完全导入临时表。

第2步:然后我有另一个foreach循环,我试图将临时表中的行循环到现有表,其中Ids匹配。

控制器完成此过程的操作:

[HttpPost]
public ActionResult UploadValidationTable(HttpPostedFileBase csvFile)
{
    var inputFileDescription = new CsvFileDescription
    {
        SeparatorChar = ',',
        FirstLineHasColumnNames = true
    };
    var cc = new CsvContext();
    var filePath = uploadFile(csvFile.InputStream);
    var model = cc.Read<Credit>(filePath, inputFileDescription);

    try
    {
        var entity = new TestEntities();
        var tc = new TemporaryCsvUpload();
        foreach (var item in model)
        {

            tc.Id = item.Id;
            tc.CreditInvoiceAmount = item.CreditInvoiceAmount;
            tc.CreditInvoiceDate = item.CreditInvoiceDate;
            tc.CreditInvoiceNumber = item.CreditInvoiceNumber;
            tc.CreditDeniedDate = item.CreditDeniedDate;
            tc.CreditDeniedReasonId = item.CreditDeniedReasonId;
            tc.CreditDeniedNotes = item.CreditDeniedNotes;
            entity.TemporaryCsvUploads.Add(tc);
        }

        var idMatches = entity.PreexistingTable.Where(x => x.Id == tc.Id);

        foreach (var number in idMatches)
        {
            number.CreditInvoiceDate = tc.CreditInvoiceDate;
            number.CreditInvoiceNumber = tc.CreditInvoiceNumber;
            number.CreditInvoiceAmount = tc.CreditInvoiceAmount;
            number.CreditDeniedDate = tc.CreditDeniedDate;
            number.CreditDeniedReasonId = tc.CreditDeniedReasonId;
            number.CreditDeniedNotes = tc.CreditDeniedNotes;
        }
        entity.SaveChanges();
        entity.Database.ExecuteSqlCommand("TRUNCATE TABLE TemporaryCsvUpload");

        TempData["Success"] = "Updated Successfully";

    }
    catch (LINQtoCSVException)
    {
        TempData["Error"] = "Upload Error: Ensure you have the correct header fields and that the file is of .csv format.";
    }

    return View("Upload");
}

上面代码中的问题是tc在第一个循环中,但匹配是在循环后用var idMatches = entity.PreexistingTable.Where(x => x.Id == tc.Id);定义的,所以我只得到第一个循环的最后一项。

如果我嵌套第二个循环,那么它就会变慢(在10分钟后停止),因为.csv中有大约1000行,而预先存在的表中有7000行。

找到一个更好的方法来困扰我。假设临时表甚至不是来自.csv,只考虑从表1中填充表2中的行的最有效方法,其中该行的id匹配。谢谢你的帮助!

3 个答案:

答案 0 :(得分:3)

由于您的代码现在已经编写,因此应用程序可以完成大部分工作,SQL Server可以更高效地完成这些工作。您正在对数据库进行数百次不必要的往返调用。当您批量导入数据时,您需要这样的解决方案:

  1. 批量导入数据。有关使用EF的批量导入效率的有用指导,请参阅this答案。
  2. 加入并更新目的地表。
  3. 处理导入应该只需要一个批量更新查询:

    update PT set
       CreditInvoiceDate = CSV.CreditInvoiceDate
      ,CreditInvoiceNumber = CSV.CreditInvoiceNumber
      ,CreditInvoiceAmount = CSV.CreditInvoiceAmount
      ,CreditDeniedDate = CSV.CreditDeniedDate
      ,CreditDeniedReasonId = CSV.CreditDeniedReasonId
      ,CreditDeniedNotes = CSV.CreditDeniedNotes
    from PreexistingTable PT
    join TemporaryCsvUploads CSV on PT.Id = CSV.Id
    

    此查询将替换整个嵌套循环,并在单个数据库调用中应用相同的更新。只要你的表被正确索引,这应该非常快。

答案 1 :(得分:1)

将CSV记录保存到与主表具有相同filed的第二个表后,在sqlserver中执行以下过程

create proc [dbo].[excel_updation]
  as
set xact_abort on

begin transaction
-- First update records
update first_table
   set [ExamDate]      = source.[ExamDate],
       [marks]      = source.[marks],
       [result]      = source.[result],
       [dob] = source.[dob],
       [spdate]      = source.[spdate],
       [agentName]      = source.[agentName],
       [companycode]      = source.[companycode],
       [dp]      = source.[dp],
       [state]      = source.[state],
       [district]      = source.[district],
       [phone]      = source.[phone],
       [examcentre]      = source.[examcentre],
       [examtime]      = source.[examtime],
       [dateGiven]      = source.[dateGiven],
       [smName]      = source.[smName],
       [smNo]      = source.[smNo],
       [bmName]      = source.[bmName],
       [bmNo]      = source.[bmNo]
  from tbUser
 inner join second_table source
    on tbUser.[UserId]     = source.[UserId]

-- And then insert

insert into first_table (exprdate, marks, result, dob, spdate, agentName, companycode, dp, state, district, phone, examcentre, examtime, dateGiven, smName, smNo, bmName, bmNo)
select   [ExamDate], [marks], [result], [dob], [spdate], [agentName], [companycode], [dp], [state], [district], [phone], [examcentre], [examtime], [dateGiven], [smName], [smNo], [bmName], [bmNo]
  from second_table source
 where not exists
       (
          select *
            from first_table
           where first_table.[UserId]     = source.[UserId]
       )

commit transaction

delete from second_table

此代码的条件仅在于两个表必须具有相同的id匹配数据。两个表中的哪个id匹配,该特定行的数据将在第一个表中更新。

答案 2 :(得分:0)

只要匹配的概率很高,您只需尝试使用CSV中的每一行进行更新,条件是ID匹配,

UPDATE table SET ... WHERE id = @id