将新行添加到可能包含重复项的数据表的最快方法

时间:2013-11-11 16:53:41

标签: sql vb.net primary-key sql-insert database-table

我有一张满是股票价格数据的表格。每行都有Ticker符号和日期的唯一组合。我通过获取包含每个股票代码的每日股票价格数据的CSV文件来一直加载新数据。我知道CSV文件中有重复项。我只想添加数据表中尚未存在的数据。最快的方法是什么?

我应该尝试添加每一行并捕获每个异常吗?或者,我应该通过读取我的数据表来比较每行与我的数据表,看看该行已经存在?或者,还有另一种选择吗?

其他信息

这就是我一直在做的事情。对于CSV文件中的每一行,我读取了我的数据表以查看它是否已存在。

Dim strURL As String
    Dim strBuffer As String
    strURL = "http://ichart.yahoo.com/table.csv?s=" & tickerValue
    strBuffer = RequestWebData(strURL)
    Dim sReader As New StringReader(strBuffer)
    Dim List As New List(Of String)
    Do While sReader.Peek >= 0
        List.Add(sReader.ReadLine)
    Loop
    List.RemoveAt(0)
    Dim lines As String() = List.ToArray
    sReader.Close()
    For Each line In lines
        Dim checkDate = line.Split(",")(0).Trim()
        Dim dr As OleDbDataReader
        Dim cmd2 As New OleDb.OleDbCommand("SELECT * FROM " & tblName & " WHERE Ticker = ? AND [Date] = ?", con)
        cmd2.Parameters.AddWithValue("?", tickerValue)
        cmd2.Parameters.AddWithValue("?", checkDate)
        dr = cmd2.ExecuteReader
        If dr.Read() = 0 Then
            Dim cmd3 As OleDbCommand = New OleDbCommand("INSERT INTO " & tblName & " (Ticker, [Date], [Open], High, Low, [Close], Volume, Adj_Close) VALUES (?, ?, ?, ?, ?, ?, ?, ?)", con)
            cmd3.Parameters.Add("@Ticker", OleDbType.VarChar).Value = tickerValue
            cmd3.Parameters.Add("@[Date]", OleDbType.VarChar).Value = checkDate
            cmd3.Parameters.Add("@[Open]", OleDbType.VarChar).Value = line.Split(",")(1).Trim
            cmd3.Parameters.Add("@High", OleDbType.VarChar).Value = line.Split(",")(2).Trim
            cmd3.Parameters.Add("@Low", OleDbType.VarChar).Value = line.Split(",")(3).Trim
            cmd3.Parameters.Add("@[Close]", OleDbType.VarChar).Value = line.Split(",")(4).Trim
            cmd3.Parameters.Add("@Volume", OleDbType.VarChar).Value = line.Split(",")(5).Trim
            cmd3.Parameters.Add("@Adj_Close", OleDbType.VarChar).Value = line.Split(",")(6).Trim
            cmd3.ExecuteNonQuery()
        Else
        End If

这是我切换到的,它给出了这个例外:The changes you requested to the table were not successful because they would create duplicate values in the index, primary key, or relationship. Change the data in the field or fields that contain duplicate data, remove the index, or redefine the index to permit duplicate entries and try again.我每次都能捕获这个异常而忽略它直到我遇到一条新线。

Dim strURL As String = "http://ichart.yahoo.com/table.csv?s=" & tickerValue
    Debug.WriteLine(strURL)
    Dim strBuffer As String = RequestWebData(strURL)
    Using streamReader = New StringReader(strBuffer)
        Using reader = New CsvReader(streamReader)
            reader.ReadHeaderRecord()
            While reader.HasMoreRecords
                Dim dataRecord As DataRecord = reader.ReadDataRecord()
                Dim cmd3 As OleDbCommand = New OleDbCommand("INSERT INTO " & tblName & " (Ticker, [Date], [Open], High, Low, [Close], Volume, Adj_Close) VALUES (?, ?, ?, ?, ?, ?, ?, ?)", con)
                cmd3.Parameters.Add("@Ticker", OleDbType.VarChar).Value = tickerValue
                cmd3.Parameters.Add("@[Date]", OleDbType.VarChar).Value = dataRecord.Item("Date")
                cmd3.Parameters.Add("@[Open]", OleDbType.VarChar).Value = dataRecord.Item("Open")
                cmd3.Parameters.Add("@High", OleDbType.VarChar).Value = dataRecord.Item("High")
                cmd3.Parameters.Add("@Low", OleDbType.VarChar).Value = dataRecord.Item("Low")
                cmd3.Parameters.Add("@[Close]", OleDbType.VarChar).Value = dataRecord.Item("Close")
                cmd3.Parameters.Add("@Volume", OleDbType.VarChar).Value = dataRecord.Item("Volume")
                cmd3.Parameters.Add("@Adj_Close", OleDbType.VarChar).Value = dataRecord.Item("Adj Close")
                cmd3.ExecuteNonQuery()
            End While
        End Using
    End Using

我只想使用最有效的方法。

更新

根据下面的答案,这是我到目前为止的代码:

 Dim strURL As String = "http://ichart.yahoo.com/table.csv?s=" & tickerValue
    Dim strBuffer As String = RequestWebData(strURL)
    Using streamReader = New StringReader(strBuffer)
        Using reader = New CsvReader(streamReader)
            ' the CSV file has a header record, so we read that first
            reader.ReadHeaderRecord()

            While reader.HasMoreRecords
                Dim dataRecord As DataRecord = reader.ReadDataRecord()
                Dim cmd3 As OleDbCommand = New OleDbCommand("INSERT INTO " & tblName & "(Ticker, [Date], [Open], High, Low, [Close], Volume, Adj_Close) " & "SELECT ?, ?, ?, ?, ?, ?, ?, ? " & "FROM DUAL " & "WHERE NOT EXISTS (SELECT 1 FROM " & tblName & " WHERE Ticker = ? AND [Date] = ?)", con)
                cmd3.Parameters.Add("@Ticker", OleDbType.VarChar).Value = tickerValue
                cmd3.Parameters.Add("@[Date]", OleDbType.VarChar).Value = dataRecord.Item("Date")
                cmd3.Parameters.Add("@[Open]", OleDbType.VarChar).Value = dataRecord.Item("Open")
                cmd3.Parameters.Add("@High", OleDbType.VarChar).Value = dataRecord.Item("High")
                cmd3.Parameters.Add("@Low", OleDbType.VarChar).Value = dataRecord.Item("Low")
                cmd3.Parameters.Add("@[Close]", OleDbType.VarChar).Value = dataRecord.Item("Close")
                cmd3.Parameters.Add("@Volume", OleDbType.VarChar).Value = dataRecord.Item("Volume")
                cmd3.Parameters.Add("@Adj_Close", OleDbType.VarChar).Value = dataRecord.Item("Adj Close")
                cmd3.Parameters.Add("@Ticker", OleDbType.VarChar).Value = tickerValue
                cmd3.Parameters.Add("@[Date]", OleDbType.VarChar).Value = dataRecord.Item("Date")
                cmd3.ExecuteNonQuery()
            End While
        End Using
    End Using

它给了我这个错误Data type mismatch in criteria expression.

1 个答案:

答案 0 :(得分:1)

大多数DBMS支持INSERT命令的(非标准)子句忽略重复项,例如:

MySQL:INSERT IGNORE INTO ...

SQLite:INSERT或IGNORE INTO INTO ...

这是非批处理模式下最快的方法,因为您在编写之前不必读取数据库。

您可以使用以下标准SQL执行相同操作:

INSERT INTO ... 
SELECT <your values> 
WHERE NOT EXISTS ( <query for your values by id> );

或(当您明确需要FROM子句时):

INSERT INTO ... 
SELECT <your values> 
FROM DUAL 
WHERE NOT EXISTS ( <query for your values by id> );

修改

MS Access没有内置的DUAL表(即,一个表只包含一行),但Access需要一个FROM子句。所以你必须建立自己的DUAL表:

CREATE TABLE DUAL (DUMMY INTEGER);
INSERT INTO DUAL VALUES (1);

你只需一劳永逸地做到这一点。然后,在您的代码中,您将执行像

这样的插入
INSERT INTO MyTable (A,B,C,D)
SELECT 123, 456, 'Hello', 'World'
FROM DUAL
WHERE NOT EXISTS (SELECT 1 FROM MyTable WHERE A = 123 AND B = 456);

因此,对于您的示例,请使用:

Dim cmd3 As OleDbCommand = New OleDbCommand(_ 
    "INSERT INTO " & tblName &  _ 
    "(Ticker, [Date], [Open], High, Low, [Close], Volume, Adj_Close) " & _ 
    "SELECT ?, ?, ?, ?, ?, ?, ?, ? " & _ 
    "FROM DUAL " & _
    "WHERE NOT EXISTS (SELECT 1 FROM tblName WHERE Ticker = ? AND [Date] = ? AND ...)", con)

(WHERE子句取决于您的键列)